## You’ve got your whole life ahead of you

August 24, 2015

There’s a nicely morbid Perry Bible Fellowship comic showing a kid on his birthday:

That’s what you get with strict determinism: the day we die is already decided on the day we are born; on the day the universe was born, in fact. But since we ordinary mortals do not have access to Death’s all powerful computing machine that lets him calculate the end of all tales (an abacus?!), we have to use statistics. Specifically, we like to use life expectancy, which is basically the average age of death of a certain population at a certain time.
Calculating it is rather easy for times far away in the past. What was the life expectancy for infants at 1900? Just look at all the people who were born in 1900, and take the average of their life spans. For modern times it’s a bit more complicated, since there are still so many yucky living people who spoil your statistics, but we get good results under reasonable assumptions.

Here’s a cool thing about life: if you haven’t died so far (and, as you are reading this, I assume you haven’t), you will statistically outlive the average baby born on the same year as you. In other words, the older you are, the older you die. This is pretty obvious, but it’s a nice pat on the back as it shows your accomplishments in the field of “not dying”. Are you 30? Congratulations! You should no longer be afraid of dying from chicken pox, drowning in a bucket, baby measles and child-malnutrition; and your chances of going to war are severely reduced. So you have to take out all those deaths out of the equation, and the net result is that your projected age goes up.

But there is a contrasting force in this whole ordeal: the older you are, the less time you have to live – because you’ve already lived some portion of your life. The life expectancy of an infant born in the United States in 2013 is about 78 years. She has 78 years ahead of her. The life expectancy of a 90 year old in the United States in 2013 is 94. She will die older (she has already passed the 78 year mark), but only has 4 years ahead of her.

This leads to the question: what is “the best age to be alive”, in the sense that “your whole life ahead of you” is the longest period of time? This depends crucially on the mortality statistics: we can imagine that in a world where most children don’t make it to age 10, people who are 30 will have more to look forward to than five year olds.

In fact, this is what happened in Massachusetts during the 1850’s (data found here):

In orange, we see how many years a person has left to live as a function of age in Massachusetts in 1850. We see that a baby just born has roughly 40 years ahead of her, while a child who made it to age 10 has about 47 years ahead of her! In the 1850’s during childhood, every year you grow older actually increases your life expectancy! In view of the comic at the beginning of the post, for these children, every time they celebrate their birthday, Death should add a bead to their life total (statistically speaking, of course; deterministically we are all doomed anyway). Newborn babies and 20-year olds can expect to live the same amount of time.
The yellow line plots the function y = x. The place where it crosses the orange line is the “half-life” point – the age where you have put as much time behind you as you have in front of you.
Finally, in blue, we see the average age of death. It continuously rises as a function of the age of the person in question. Notice that it is always above the yellow line y = x: This just means that when you look at a specific living person, that person will die older than she is right now. Unfortunately the data I have only goes up to 80 years, but eventually the blue line and the yellow line should coincide, at the age of the oldest person who ever lived.

But that was way back a hundred and fifty years ago. With the advancement of modern medicine, food, and infrastructure, are things any different now? Indeed they are. Here is the United States data for 2013 (data found here):

First, life expectancy went up; that’s a no brainer. But what’s interesting is that now the orange line no longer has a peak in the beginning. Further, the blue curve stays almost flat for the first ~40 years or so of life: The difference in the expected age of someone who is 40 and someone who was just born is a mere 4 years; contrast this to the whopping 29 years in 1850.
We have so completely decimated child and youth mortality, that it no longer “pays off” to grow older, in terms of gains in life expectancy. Looking at the data shows that this phenomenon – the lack of orange peak – started in the 1920’s, at least in the United States.
So the Death and the Abacus comic is indeed relevant – but only for the modern era. In the past, getting to 10 would have warranted a real celebration – for that is the age where “you’ve got your whole life ahead of you” carries the most weight. But today? The countdown starts with your first gasping scream for air.

## A primitive model for genetic recombination

August 17, 2015

Introduction:
I’m taking a class in general genetics at the Technion, and there we learned about genetic recombination, and in particular, homologous chromosome crossover: a phenomenon where chromosomes exchange sections between themselves during meiosis.
When this happens, some of the members of the population exhibit “recombinant genomes”, which are different than their parent genomes should supposedly generate. Surprisingly, this part of the population never exceeds 50%, even though at first look it seems as if it could.

In this post, we’ll see a model of chromosomal crossover statistics that explains this phenomenon, as well as giving an estimate to the physical distance between genes as a function of population statistics. I’ll assume you know some basic genetic terms, such as “dominant” and “heterozygote”, but I’ll explain about crossovers in general and describe the problem in more detail below. You can skip directly to “The model basics” if you already know about recombination.
The post will be about 50% biology and 50% math.

Biological background:
We’ll work through an example with the commonly used traits of EYE COLOR and WING-SIZE in fruit flies. Both are controlled by single genes found on the X chromosome.
A fly’s eyes can be either red or white, with red being a dominant quality. We’ll mark the dominant gene with R and the recessive gene with r. Thus, if a fly’s genome contains Rr or RR, it will have red eyes; otherwise, if it contains rr, it will have white eyes.
Similarly, a fly’s wings can be either long or short, with long being a dominant quality. We’ll mark the dominant gene with W, and the recessive with w, so long winged flies have Ww or WW, and short winged flies have ww as their genotype.

Suppose we have a heterozygote cis female. In other words, her genome contains both the dominant and the recessive genes (so she has RrWw in her genome), and both of the dominant genes are found on the same homologous chromosome. In other words, her two X chromosomes look like this:

During meiosis, her two homologous chromosomes duplicate and then separate, and we get two types of possible germ cells: RW and rw:

However, it is also possible for crossover to occur: two chromatids are “sliced” at some point, and then the two parts of each are glued to each other.

If this happens during meiosis, the outcome is four possible germ cells: RW, Rw, rW, rw:

Now, what happens when we mate our RrWw female with a white eyed, short winged male? Since these traits are found on the X chromosome, and a male fly only has one of those, he necessarily has the recessive allele, rw. We don’t care about the Y chromosome here.

Upon mating, the male fly will give the offspring either an X or a Y chromosome. Let’s ignore the males at this point, and focus just on the females. Since our male’s genotype is rw, we will get the following combinations: RrWw, rrww, Rrww, rrWw. All of these are phenotypically different, and each represents a different combination of red/white eye and long/short wing. The Rrww and rrWw genotypes are recombinant – they only exist in the population because of recombination.

Suppose now that the chance for recombination between R and W is some number q between 0 and 1. Then if we look at a very large collection of germ cells from the mother, we expect the following distribution:

RW should be $\frac{1}{2}(1-q)$ of the germ cell pool
rw should be $\frac{1}{2}(1-q)$ of the germ cell pool
Rw should be $\frac{1}{2}q$ of the germ cell pool
rW should be $\frac{1}{2}q$ of the germ cell pool

This is because q of the population should be recombinant, and whenever there is recombination we get an equal amount of Rw and rW.
After mating, when looking at the females, we only need to add the father’s recessive genes, and we get:

RrWw should be $\frac{1}{2}(1-q)$ of the population
rrww should be $\frac{1}{2}(1-q)$ of the population
Rrww should be $\frac{1}{2}q$ of the population
rrWw should be $\frac{1}{2}q$ of the population

Thus, Rrww and rrWw comprise $\frac{1}{2}q+\frac{1}{2}q = q$ of the population. This can be measured in real experimental trials, since each of the above genotypes translates into a different observable phenotype.
At this point in our theory, q can be any number between 0 and 1. If q is 0, then there is never any recombination, and the two genotypes RW and rw go hand in hand forever. If q is 1, then recombination always happens.
However, it is an empirical fact that the percentage of recombinant population is never more than 50%! The measured value of q is always less than or equal to 0.5.

There must be some mechanism that prevents recombination from happening too often. We can make appeals as to the utility of this mechanism, and wonder whether it is good or bad to have a small number or a large number of recombinations between genes – but for now, let’s try to think of an underlying model.

Image source: wikipedia

The model basics:
We treat the chromosome as a linear piece of DNA, with a length of “1 chromosome” – in essence, it is a line segment of length 1. The different genes are points on this line, and are therefore assigned a position 0pos1. In reality genes have some finite width on the DNA strands so a more accurate model will treat them as small intervals, but it will be easier to consider them as points.
We’ll assume that the gene that codes for eye color is on the left of the gene that codes for wing size. Denoting the position of the first by x and the second by y, we have this schematic for our chromosome:

The primary element in our model is the crossover event, or a cut. In this event, two homologous chromosomes are cut at a random place, distributed uniformly across its entire length. The chromosomes then swap strands at this position.

There are two options here. If the cut occurs in in interval between x and y, the genes will be swapped, and we have recombination. However, if the cut occurs outside the interval [x,y], then those two genes will not be affected. Since the cut distribution is uniform, the chance to land between the two genes is just y-x, so the probability of recombination is $q = y - x$.

This is a simple operation, and it’s tempting to think that it is the entire deal, but this is not so. In a crossover event, if two genes are far away from each other, meaning, at the opposite sides of the chromosome, then the probability of recombination can be very close to 1: nearly every cut we make will separate them. But we never observe a q above 0.5! There is obviously something more that we are missing here.

Image source: Science magazine

The answer: the above description is true only for a single crossover event – a single cut. However, there is no guarantee that a chromosome will undergo any crossovers at all during meiosis. Further, a chromosome may actually undergo several crossover events, as was experimentally discovered when looking at the recombination relations between a triplet of genes on the same chromosome. But look what happens when there are two crossover events in the same interval [x,y]: the strands are switched twice, and ultimately there is no recombination between the two genes!

We can now convince ourselves: whether or not we see recombination between two genes depends on the parity of the number of crossover events that occurred between them. When looking at the population statistics, what we ultimately see is the average of the parity of crossovers.
As an artificial example, suppose that during meiosis, there is a 50% chance of performing a single cut, and a 50% chance of not performing any cuts at all. In that case, for two far away genes, which are always separated by any cut, there is a 50% chance of getting recombination, and 50% chance of not getting it. In other words, q was reduced from 1 to 0.5. In general, in this case the observed probability of getting recombination is $q = \frac{1}{2}(y-x)$, as half the time we do not get a recombination at all.
Of course, there is no reason to assume that there is a 50% chance of getting no crossover event, and 50% of getting exactly one – the number of crossovers could behave in different ways – but we see that the actual percentage of recombinant population depends on the distribution of the number of crossover events in the chromosome. Which distribution should we choose?

A slightly flawed offer:
A simple choice would be a binomial distribution. The reasoning goes as follows: during meiosis, there are all sorts of enzymes floating about the chromosomes, which are responsible for cutting them up and gluing them back together. There may be a large number n of these enzymes floating about, but they only have a certain probability p of actually performing their duty. Of course, we assume that they act independently, even though in reality they may interfere with each other. So the number of crossovers depends on the numbers of “successes”, where a success is an enzyme doing its work properly, which happens with probability p. This means that the number of cuts distributes according to $C \sim Bin(n,p)$.

So assuming the number of crossover events distributes according to $C \sim Bin(n,p)$, what is the probability of getting an odd number of crossovers? Let’s take a moment to calculate it.

For any $n$, denote that probability by $P_n$. Suppose you already checked $n-1$ of the enzymes. Then with probability $P_{n-1}$, you already have an odd number of crossovers, so you don’t need any more of them. Further, with probability $1-P_{n-1}$, you have an even number, and you want another crossover to get an odd number. So the probability obeys the recurrence relation

$P_n = P_{n-1}(1-p)+(1-P_{n-1})p.$

with the initial condition that $P_0=0$, as if there are zero enzymes there are zero crossovers, which is an even number.
More nicely:

$P_n = P_{n-1}(1-2p)+p$

$P_0 = 0.$

If we look at just this equation:

$P_n = P_{n-1}(1-2p)$

we quickly see that the answer is $P_n= a \cdot (1-2p)^n$. However, we also have that additive +p in our original equation. It turns out we only need a small adjustment in order to compensate it though, and in this case we just have to add an extra constant, so that

$P_n = a \cdot (1-2p)^n + c.$

Since the equation is linear, this is actually very much like the particular solution of a differential equation, and we can find c directly by putting it into $P_n$ in the recurrence relation:

$c = c (1-2p) + p,$

which gives

$c = \frac{1}{2}.$

Taking into consideration the initial condition, the solution is then,

$P_n = \frac{1}{2} - \frac{1}{2}(1-2p)^n$

Wonderful! For very large n, the probability of getting an odd number of crossovers goes to 0.5! Even for relatively low probabilities p, the quantity $(1-2p)^n$ goes to 0 very quickly.

This gives an answer regarding two genes which are very far away: they are affected by every cut performed by the enzymes, and so their recombination probability is exactly the same as the probability for getting an odd number of cuts. But what about genes which are closer? For them we actually have to take into consideration the fact that not every cut the enzymes make will cause a crossover.
Notice the following: the number of cuts in every chromosome is distributed binomially, $C \sim Bin(n,p)$. If we already know the number of cuts to perform – say, k – then the number of cuts which affect the two genes at positions x and y is also distributed binomially as $Bin(k,y-x)$, since every cut has a probability of y-x of crossing the two genes. So the number of crossovers G between y and x, conditioned that $C = k$, is $Bin(k,y-x)$, and k itself distributes as $B(n,p)$.
Now comes the cool part: there is a theorem about binomial distributions which says the following: if X is a random variable that distributes binomially, $X \sim Bin(n,p)$, and Y is a random variable that conditioned on X distributes binomially, $Y|X = Bin(X,q)$, then Y is also binomial, $Y \sim Bin(n, pq)$! Using this theorem, the number of cuts S which swap between x and y goes as $S \sim Bin(n, p \cdot (y-x))$.
Now we can apply the same reasoning as before, only this time, a “success event” is not merely when the enzymes perform a crossover anywhere on the chromosome, but rather when they perform it in some place between x and y.
The final probability of getting recombination between two genes is then

$q = \frac{1}{2} - \frac{1}{2}(1-2p(y-x))^n$

This is very nice, and it gives us some asymptotics as well. For large values of $p(y-x)$, the second factor is negligible, and we have $q =\frac{1}{2}$. For small values of $p(y-x)$, the second factor can be expanded to first order, and the two $\frac{1}{2}$’s will cancel each other out, giving us $q \propto (y-x)$.

Slightly improving the binomial:
Overall, the model proves adequate in its predictions, and its simplicity is alluring. However, it is not without problems. For example, its two parameters – p and n – must somehow be found out, and it is not entirely clear how to do so. In fact, the very fact that we have a fixed n here seems out of place: by keeping it around, we assume that there is a constant number of enzymes working about, when it is much more reasonable that number varies from cell to cell. After all, when manufacturing hundreds or thousands of enzymes, there must be variation in the numbers.

Luckily, there is a simple way to fix this, which is actually firmly based on reality. Instead of assuming that the number of cuts the enzymes make is distributed binomially, we assume it follows a Poisson distribution, $C \sim Pois(\lambda)$, for a yet unknown $\lambda$. This actually makes a lot of sense when we remember that Poisson distributions are used in in real life to describe queues and manufacturing processes, when what we know is the average time it takes to perform a single event.
If the number of overall cuts has a Poisson distribution, how does the number of crossovers between x and y behave? Well, given that the number of cuts is k, the number of crossovers is still as before, $Bin(k, y-x)$. But again the theorems of probability smile upon us, and there is a theorem stating that if $C \sim Pois(\lambda)$ and conditioned on $C = k$ we have $S|C \sim Bin(C,y-x)$, then

$S \sim Pois(\lambda(y-x)).$

So the distribution of crossovers between x and y will also follow a Poisson distribution!
Now we only have to remember the simple trick, that

$Pois(\lambda)= \lim_{n \rightarrow \infty} Bin(n,\frac{\lambda}{n}).$

Thus, under the assumption of a Poisson distribution, the final probability of getting recombination between two genes is

$q = \frac{1}{2} - \lim_{n \rightarrow \infty} \frac{1}{2}(1-\frac{2 \lambda(y-x)}{n})^{\frac{1}{n}},$

or, more simply,

$q = \frac{1}{2} - \frac{1}{2}(1-e^{-2 \lambda (y-x)}).$

This again has the same desirable properties as before, but the model is simpler: we got rid of the annoying n parameter, and the probability parameter p was replaced by the rate parameter $\lambda$.
(Note: For small values of (y-x), the probability for recombination is $q=\lambda(y-x)$; if only we could set $\lambda = 1$ and get a direct relationship between q and the distance between genes…)

To conclude:
The percentage of recombinant phenotypes in a population of offspring is always smaller than 50%. This is not an arbitrary number, but stems from the underlying biological mechanism of recombination. Because multiple crossover events can occur between two genes, what’s ultimately important is the parity of the number of such events. When the total number of crossovers in a chromosome follows a Poisson distribution, the parity can be readily computed. It behaves like a fair coin toss for genes which are physically far away from each other, but is linear in the distance between the genes when they are close to each other.
This “Poisson crossover model” is very simple, and of course does not explain all there is about recombination (genes are not points on a line; distribution is probably not Poisson; events are not independent; there are “recombination hotspots” in chromosomes; the chromosome is a messy tangle, not all of which is accessible; etc). But it looks like a good starting point, and to me seems adequate at explaining the basic behaviour of recombination.

## Book review: Imperial Earth

August 8, 2015

By our beloved free encyclopedia, “Hard science fiction is a category of science fiction characterized by an emphasis on scientific accuracy or technical detail, or on both.”
When I (discreetly) mentioned to my friend that I might be interested in that sort of thing, he suggested reading Arthur C. Clarke’s “Rendezvous with Rama”. I did, and found it glorious. But perhaps we should mention another book by Clarke as a runner up for the distinguished merit award in that category. Here is an excerpt, recounting the protagonist’s experience at a fancy party:

Yes folks, your eyes deceive you not. As for me, I was rather lucky – for just half a year ago I took the third-year, fifth-semester course in solid state physics in which students learn about the Drude-Sommerfeld model for electron conduction in lattice-based systems – so I was able to calculate Duncan’s mean free path and draw a mental picture of his quantum scattering.
Intrigued? Want to read a sci-fi political novel while simultaneously taking a crash course in radio astronomy and polyominoes? Say no more! I present to you, Imperial Earth!

There is an entire chapter on pentominoes – a cool math puzzle that involves fitting twelve tetris-like pieces together to form various shapes. No ink is spared when describing the combinatorial explosion problem and the comparison between brute force computing and human creativity.
There are several long sections describing the difficulties of picking up long wavelengths on Earth, and how mankind can cope with them. There are considerations of the harsh reality of space travel and communication. There are even numerous paragraphs on interior design (of underground accommodation facilities on Saturn’s moon Titan, of course).
Of course, I exaggerate a bit. There’s still a plot, some character development, and a bit of mysterious schemes. But the reader who expects too much in this area will come out disappointed – these are not the book’s strong points. Rather, they serve mostly as an excuse for Clarke to show off and preach some of his thoughts and ideas about future society.
And this indeed is how you should approach the book: as a collection of ideas. Duncan travels through through a world where we have started colonizing the planets; this takes both time and technology, and Clarke fills in the gaps as he wills. Some ideas are textbook standards, like a unified world government that (somehow) manages to hold the peace on Earth. Others are more amusing with roots in the Classics, such as having the US president be picked at random from a pool of potential candidates: one of the required qualifications is not wanting to be president, and successful presidents get “time off for good behaviour”. And of course, there are more unique ones, such as Earthmen’s obsession for preservation of the past, despite (or perhaps because of) the marvellous technological advances.
It is true, some of the themes and events are underdeveloped, and the book cannot go down every branching path it presents. Various holes scatter the landscape, both in the overall plot and in the little details that make the world go tick: it is nice to have US presidents who are picked at random, but how this system stays in faithful unbiased hands is left for us to wonder.
But perhaps that’s a good thing.

## Our beautiful

August 3, 2015

I recently got back from a sixteen day vacation in Croatia. Croatia is a beautiful country, renowned for its lush forests, crystal lakes, stunning mountains, and dreamy coast. The natural thing to do, therefore, is to stay cooped up for almost two weeks in the Gymnasium of the small town of Požega, and teach three high schoolers about laser displays.

Sounds a little similar to last time, I know. But this time, unlike the last, I actually got to see some of those lush forests, crystal lakes, stunning mountains, and dreamy coast. Indeed, after the hectic Summer School of Science, I finally managed to get a road trip in Our Beautiful Homeland. Here are some photos; a more technical description of the school might appear later.

It all started in Zagreb. Cloudy with a chance of sweat and Egyptians.

Zagreb’s main square is truly a wonderplace. Not a day goes by without some artistic or cultural event. I have at least one more post I want to write about some interesting cultural phenomenon I saw there. But that will wait for another time, as the Summer School of Science was just around the temporal corner.

After a 7 half-hour bus ride to Požega, S3 started. I will not go into detail here, just post pretty results: the students built laser scanners using speakers, and used them to generate Lissajous curves (and more!).

After the camp there followed several more half-hours of bus riding, followed by even more bus riding. Overall, buses are a good place to be. Some of them even have wifi.
But all that sitting quenched inside various means of public transportation certainly paid off. For now, in prophetic order, we have our sightseeing. Lo and behold the prophetic order!
Lush forests and crystal lakes:

Stunning mountains:

Dreamy coast:

Ah, Croatia, you are as mysterious as you are beautiful. It’s almost needless to say, you truly have captured me.

## Book review: The Andromeda Strain

July 9, 2015

*** spoiler included ***

Do you know that feeling where you say to yourself “I’ll just check out this article on wikipedia” and then five hours later when you finally raise your head from the screen and gasp for air after having dredged half the the internet you cannot help but wonder “where the hell did seven hours go?!”?
That is “The Andromeda Strain” by Michael Crichton, for the better and worse of that statement. On the one hand, it’s a page turner; you’ll have to take care not to tear the pages as you blaze through them faster than the speed of sound. On the other hand, at the end, you’ll sort of want those seven hours back.

Crichton wrote a very engrossing and thrilling book. Merely the basic premise – an emergency team handling the outbreak of an alien microbe – commands us to think for a moment how complicated indeed First Contact would be with any extraterrestrial race. This is an interesting and thought provoking topic, and science fiction is undoubtedly filled with contact books, speculating on an entire range of scenarios. Meeting with a disease is an original one, that seems obvious in hindsight in a satisfying kind of way. The possibility of an alien pandemic that threatens Earth’s entire population with near instant death or insanity is definitely page-turning material. The book also imitates the form of a classified report, and this adds realism to the sense of what-will-happen-next exhilaration.
It’s too bad that as you turn the final pages, exhilaration turns to disappointment, and all the pent-up tension, all the built-up potential energy dissipate to nothing. The disease simply and spontaneously “turns dormant”. Nothing happens to major American cities. Millions of lives are not compromised. No megapolis is evacuated. The scientists working on the project did not save the day; in fact, disaster was averted only because their advice was ignored. This, despite the fact that book constantly warns you, “and then the scientists made their second, crucial mistake”.
“Ok”, you might say, “so the book focuses mainly on the investigation going on in the research laboratory, instead of the dangers of the outside world. How is that so different from Rendezvous with Rama?” Well, there are at least two differences.
First, in Rama the exploration is so obscenely fascinating, and mankind’s technology is so pitifully crude compared to the Raman’s, that the team’s feeble attempts just multiply the overwhelming awe conjured by the book. The whole point of Rama was exploration. By contrast, The Andromeda Strain sells itself as one about preventing a major disaster, but doesn’t finally deliver on that front. The scientific investigation is exciting, but is not sustainable on its own without the knowledge that failure to contain the outbreak will have catastrophic consequences.
Second, there’s a lot of science in this book, but it falls short of convincing the slightly trained eye. While some of the methods used by the investigation team are Freakin’ Cool, the scientists sometimes perform tests in a manner so sloppy you would expect better from a freshman undergraduate. A freshman majoring in History, mind you. And the book uses “evolution of microbes” in such a grossly wrong way, it would have been better off to just blame it all on “the almighty hand of creationism”.

But do not take this review too harshly, dear reader. If disappointment is the main flaw of this piece, then the real problem is just the expectation. Give this book a try! You’ll read it quickly enough. Just keep in mind that today is not the day for a post-apocalyptic Andromeda-quarantined America.

## Skin and bits

June 30, 2015

My wife and I have a problem.
Well, I mean, not a problem, per se. We are “of divided opinions”.
She wants to have a baby. I want to program. The two are obviously incompatible.
And it’s not that she doesn’t like to program, she loves it very much, coding away her will and command; it’s just, she also wants to have a kid.

We went to a marriage counselor. The psychologist looked at me like I was crazy and said that the desire to have a baby is a natural human quality, but the desire to program is not a natural human quality. Well, one thing’s for sure, he’s not getting any more of my money.
I took her to the hospital to the place where they keep all the babies. I had thought that the sight of dozens of screaming amorphous lumps will sober her up. Instead, she literally melted. Every additional screaming amorphous lump she saw made her eyes grow even puppier. It turns out it’s a maternal instinct or something, to turn into a dripping puddle at the sight of screaming amorphous lumps. We eventually left – I mean, the staff drove us out yelling “how the hell did you get in here?!” – with the sole achievement of further increasing her desire for a child.
I reminded her that potential employers frown upon pregnant women. I noted that while “I reared children and brought them up, but they have rebelled against me”, no prophet ever foretold “I compiled programs and linked them up, but they segfaulted against me.” I warned her against the physical dangers embedded in childbirth. I explained that the open source community is full of comments, forums and technical support, but mankind’s source code is proprietary, undocumented, and can only be understood through reverse engineering. I remarked that a git-commit can always be reverted, but you cannot push a baby back into the womb. I elaborated the finest examples and claims, and generally flung such powerful and crushing arguments for my cause, such convincing and uncompromising rhetoric and logic, that she hadn’t even a speck of a chance to refuse them.
She refused them.
She tried to explain something about finding happiness in life and the miracle of childbirth, or something like that, I don’t know, I didn’t really pay any attention. I was confident that I was in the right. Which was too bad, since she was confident that she was in the right.
We had a little bit of a fight over it. Plates were thrown. They were plastic, so nothing broke – I knew buying them would pay off eventually. We agreed to separate for a week and think things through on our own. Well, I mean, she agreed, I didn’t have much of a choice. I rented a room in a tattered downtown hotel. I took my laptop with me, and they had free wifi, so instead of thinking things through on my own, I mostly programmed.
But at night, sitting on the sweat-soaked bed, with the ceiling fan slowly turning as if caught in trance, creaking every three turns, I couldn’t help but think things through on my own. Think about all sorts of things. On kids and the desire to leave a mark in the world. On the meaning of life and the fact that we are just microscopic grains of dust against the vast cosmos. On whether node.js was really the right choice for my server-side. On the joy of seeing your own flesh and blood grow up and learn to walk and talk and decide to get body piercing in dubious places. On whether it’s worth losing the woman you love for code. On whether it’s better to use a red-black tree or an AVL.
For many an hour, I sat.
Thoughts flowed through and around me, and with them, answers.
After a week I came home. She hadn’t changed the locks!
“I have a solution!” we both said at the same time.
“You first!” we both said at the same time.
Ok, in these cases you need to choose a random number between 1 and 20 and wait that amount of seconds. I chose 4, which is the standard random number according to IEEE. She chose a larger one.
“I was such an ass,” I said.
“Yes, you were an ass,” she said. “But I have a solution.”
“It’s so obvious.”
“It feels so right.”
“All this time, it was just sitting right in front of us.”
“Where did this stupid idea come from that the desire to have a baby and the desire to program are mutually incompatible?”
“Why not combine the love of kids with the love of bits?”
“It’s such a wonderful idea! I’m so happy you think like I do!”
“I love you.”
“I love you.”
“Let’s go do it, right now!”
And so, my wife and I embarked on the most magical journey that a man and a woman can experience as a couple: writing together our first artificial intelligence.

## Untraining your bias

June 18, 2015

Did you know? Women are underrepresented in mathematics and science oriented faculties in Israel. In fact, to say “women are underrepresented” is a HUGE underrepresentation of how little women there are. For example, in the Technion’s mathematics faculty, you can count the number of female members on three fingers, but the total faculty member count is almost 70 (this includes the emeritus and retired professors, which is generally ok, since why shouldn’t there be female emeriti? Anyway, even without them, the percentage of women is in the single digits). The statistics aren’t much better for other faculties and universities.
This was in the news lately, as the ministry of science released a report on the issue. This is a Good Thing: now people are aware of the problem, and will positively definitely surely immediately act to do something about it. At least, the minister of science said he would!
(Though you can bet that so did all the previous ministers of science. That’s not to say that things aren’t better than they were before, and that no efforts were invested – I’m sure many efforts were invested, and surely things are (slightly) better than before. It just takes a lot of time to fix these things, and the reasons for the underrepresentation are probably deeply rooted inside both Israeli culture and its education system, and will not be changed by a simple government program but rather by prolonged erosion as older generations are replaced by (hopefully) more equality-oriented fresh ones).

But being aware of the problem is half the solution, right? And now you, although you aren’t a minister of science just yet, you can do your part to complete the other half.

What can you do, anyway, at least on the personal level? There are many apparent reasons as to why women are underrepresented in academia and industry, but the first thing you can do is to stop discriminating against them.
“Me? Discriminating against women?! Blasphemy, heresy, and outright profanity! To call me a misogynist is an injustice so immense, just to think it should result in indictment!”
Woha there. That might be true, but it turns out that women are discriminated against when being evaluated on their work, performance, academic record, lecturing, writing, reading, lion taming, dressing habits, or pretty much anything you can think of when compared to men. I take as an example a notable study found here. In this study, members from various faculties of different universities were asked to rank application materials of a student applying for a laboratory manager position. All faculty members received exactly the same c.v and statement letters to review, apart from one difference: in half the forms the name of the student was that of a male (it was “John”), and in the other it was female (it was ”Jennifer”). They gave scores on “competence”, “hireability”, “deserving-of-mentoring” and salary. Here is one graph from that paper:

The application forms were identical, yet the results – not so: male students had better rankings on average than female ones. One possible conclusion is that just seeing a female name on top of a c.v causes people to rank the applicants lower / seeing a male name causes them to rank higher (there may be other possible conclusions, and the study itself may be biased / shoddy; I admit I am no expert in statistics and experimental methods in psychology, and cannot judge the quality of the study better than your average jack-o’-the-mathematics-student. For the rest of this post, I assume the above conclusion is true).

A good question to ask is, “how many people are susceptible to this bias? After all, most people aren’t in HR, and don’t have to review c.v’s and interview people all the time.” Of course it depends on where you are in life both spatially and temporally. In academia, at least, it’s almost inevitable to have to do something similar at some point in your life, even as a grad student: for example, teaching assistants assist in checking homework assignments and exams (the latter are supposed to be anonymous for precisely the reason of reducing biases). Then again, most people aren’t in academia, either.

Knowing this happens, can we overcome it? I think it’s safe to say that many biases such as these are subconscious, in that it seems odd to me that reviewers sit down and genuinely say to themselves, “Oh! I see a woman’s name here! Curse this curriculum vitae, it belongs to an inferior and devilish creature and should be burned at once.” (alternatively, “Oh! I see a man’s name here! This blessed angel will receive all the gifts my prowess can gather”, but we’ll stick with the female-negative instead of male-positive version. To distinguish between the two, the study should have also sent neutral / nameless application forms).
Indeed, it is hard to openly declare “This candidate is a woman, and for that she must suffer! Minus 10 points for Gryffindor”, because Western society today, for most part, denounces that kind of activity. However, it’s much easier to just say to yourself, “I don’t know, I don’t feel very strongly about this candidate-who-happens-to-be-female”, and subtract 10 points regardless. Probably the decision to rate the applications as they did was made at a far deeper level.
But fighting subconscious decisions is hard. Unless we have very well defined metrics, e.g we accept candidates based exclusively on the number of publications (in which case we can have contemporary machines grade and sort them), there will always be a “gut feeling” in part of the selection process. Many times this can be a good thing, since “gut feeling” is based on your previous experience and helps make generally not-so-bad choices when facing decisions with many uncertainties and missing information (and there is a lot of missing information here; we have reduced the candidate’s entire life to a statement essay and some recommendation letters). However, if that gut feeling is tainted with ethnic and gender biases, how are we to know how strong they are? How are we to differentiate between Good Gut Feeling and Sexist Gut Feeling?

The proper thing to do is just find out our biases and fix them. To think to yourself, “What are the objective qualities I am looking for? What merits and flaws does this woman have? If this had been a man, would I have rated him any differently?” But this is hard, since this is not a man, and the Golden Objective Qualities will not always reveal themselves to you, and as you review the application with all these things in mind, there will be a nagging voice at the back of your head whispering in distress, “this is a female, damn it don’t screw this up, you need to rate her fairly, does that mean giving +5 points by default, no, that would be biased as well, what to do what to do what to do?!” The resultant evaluation might be too high, because of the positive anti-bias you are trying to apply, or it might be too low, because of the fear of such a positive anti-bias.
It’s sort of like performing a mechanical task, such as walking, or playing the piano, or hitting a baseball. If you stop to think about how you are going to do it, you will fumble. Once you become self conscious about it, you have put yourself in a pitfall.

But herein, I think, lies a possible key to success. In order to overcome bias, we must refrain from actively thinking about that bias, yet still evade it when it comes. The only way to do that is to actively train yourself, so that when the moment of truth arrives, you will do do it passively. Naturally.
This would be an obvious “given” in in a society where gender equality is the norm. In that case, your whole life you would observe unbiased decisions and non-discrimination. Living a regular life would be considered “training” in itself. But that’s not the case today (evidently), so as in all mechanical learning, we must train.

What I’m offering is: the brain works in mysterious ways. It’s capable of meta-thought, and that meta-thought disrupts it. However, it’s great at learning habits, and habits can be reinforced by training, i.e by feedback. With enough feedback, using some inner mechanisms-which-we-don’t-yet-understand, our brain can master all sorts of skills; some of which we have a very hard time replicating algorithmically and precisely. Why not use this ability to learn how to rate job applications with equality?
(Just some examples of our majestic abilities: face recognition, language acquisition, playing tennis, finding rhymes, and understanding sarcasm. You may object that the brain is hardwired explicitly to solve some of these; that’s ok, we still require a learning process to use them. Also, no need to go into details of different algorithms which do attempt to solve these problems).

This requires a training system. I propose the following:

Introducing www.ungenderbiasme.com! (patent pending; link not yet up). In this site you can methodologically train to remove gender biases when reviewing job applications and c.v’s. It’s quite simple: you log into the site and start reviewing both male and female applications (either randomly generated, or selected real ones). Like the study cited earlier, the same application forms are reviewed by many people around the world, with the only difference being the name and gender of the applicant. So we suppose that when you rank an application, 5000 people from around the globe will rank the same form, assuming the applicant is female; another 5000 will rank the form, but assuming the applicant is male.

Of course, as described above, during this ranking process you will be gender conscious. You will fret and wonder if you are unconsciously overranking or underranking. You will make mistakes. That’s ok though, as the whole purpose of the site is to let you know when you are doing it too much or too little. Once you submit your ranking, your scores are compared to the scores of everyone else in the world who reviewed the same application form as you.

There are several possibilities as to how to do this; here is one. Suppose you rank a female applicant. Your scores are then calibrated against the scores of the people who ranked the same female applicant. If you ranked significantly higher than say 95% of the others, you might just tend to be generous; if you ranked significantly lower  than most of the others, you might just have higher standards. This calibration is cumulative and takes into consideration all your previous rankings as well.
After the calibration, your scores are compared to those of the people who ranked the same male applicant as you did. Now you can see if you are gender biased! (at least compared to the rest of the population). After every form you review, you receive immediate feedback: “You scored this female candidate *way* higher than others scored the corresponding male candidate. Perhaps you overshot in your attempt to fix the world?”.
Effectively, the site provides an indicator as to how gender-biased a reviewer you are. By doing many such reviews (of course they are all short and fun to do), you learn to correct yourself by trial and error. Do this enough times, and you will no longer have to think about being unbiased; it will become part of your review process. Hopefully this will also trickle into the rest of your life, and not just the Job-Application-Checker side of you.

The naysayers will say that there are some difficulties here. You compare yourself to the rest of the world, but who said that the world knows what it’s doing? After all, if all was well, we wouldn’t need this site in the first place. Also, what happens when people start getting better? Does this model incorporate the fact that as time goes on, the world’s population will get better at unbiasing themselves? There should also be some stronger incentives to use this site than “you-will-be-a-morally-better-person-afterwards”, as many times people don’t seem to care about that (maybe it should be mandatory in order to get a professorship in all universities?).

But most importantly, does this scam even work? Who said that this gender bias, as a cognitive behavior, is similar to the mostly-mechanical skills I listed such as hitting a baseball? Who said that it can be trained this way? Even if it could, what guarantees that this specific method will work?

So the site isn’t up yet, and for good reason. Instead, I call out to you, fellow psychology and human behavior researchers! A clinical trial in the trainability of gender biases is yearning to be held. Surveying the harsh-to-be-a-woman environment of today, surely you have the incentive; now just find the money and time, and start performing world-bettering research!

(Alternatively: perhaps such studies already exist but I am ignorant of them (did not come up in a short google search). If so, show!)