## Not suspicious at all

November 18, 2015

It’s not every day that you get an email like this:

Dear Dr. Renan Gross,
Greetings from Journal of Insights in Biomedicine
It gives us immense pleasure to e-mail an eminent person like you.

We have chosen few scientists who have contributed excellent work in the field of Medicine and it will be our honor if you could contribute a research, review, short Commentary.

Your valuable manuscript will be published in the upcoming issue, to boost the quality and value of our journal “Insights in Biomedicine Journal! “

Actually, I get one like it about once a week, ever since I joined the Weizmann Institute, and this is an interesting phenomenon by itself. But perhaps you should know a few facts first:

1. I just started my Masters a month ago. In expectation, I have at least five more years until reaching doctoral status.
2. The amount of “excellent work in the field of Medicine” that I have contributed is exactly 0.
3. The top three images when googling “eminent people” yields Malala Yousafzai, Abraham Lincoln, and Mahatma Gandhi. Further search reveals that, shame, I’m not even shortlisted.

This is not suspicious at all; in fact, I am deeply honoured! This email was certainly crafted personally for me, and I will be glad to contribute a research, review, and short Commentary. So I went to their site (http://biomedicine.imedpub.com).

Ok, so the homepage seems nice: they have a list of subtopics with descriptions that are related to biomedicine (including some weird ones, such as biophysics, and biomedicine itself…). On the right there is a list of suggested conferences, all of which are about a year(!) away and have an almost identical web-page because they sit under the same organizer (omics): there are no individual university conferences and this is not suspicious at all.

But enough about that – I want to see what sort of papers they already accepted, to see if I should write something in a similar format. They have an “articles” tab. Here is what they have under “articles in press”:

What is this? It seems like the cover page of a journal. But is it? it has no date. It has no “see page 47 for full paper”. It has NOTHING. And this is the only thing on the page.
They also have a “current issue” tab. It has EXACTLY the same image, and nothing else.
Ok, but what about past issues? Well, the “archive” currently holds the following: “No Volumes and issues availiable.. [sic]!”

Not suspicious at all.
One possible explanation is that it’s a new journal, and doesn’t have any older issues available. This would sort of fit in with the fact that they sent that highly-flattering-yet-totally-off mail: they are not well known yet, so they want to recruit more authors. (Exercise for the readers at home: try googling “Insights in Biomedicine” and explain the results).

We can explore the site a bit more. For example, go and see who the editors are. There are plenty: Paulo Marcos Pinto, who is a doctor from Brazil; Wei-Lan Yeh, who is a doctor from Taiwan; Dr. INTHRANI RAJA INDRAN, who is a doctor from Singapore (his name was written in caps on the site). And there are 11 more, all doctors and professors. Quite a big team, yes? You can try googling their names. While some of them are real professors with a university site and all, for most part you don’t get as many results as you might think. It seems as if most of these people really haven’t published enough material in their life to have the needed expertise to edit a journal; at least, that’s by googling. (I wonder if they published enough to exist; not suspicious at all).

But it’s not like I care about the editors; what’s really important is handing in manuscripts. The “Author guidelines” quickly shows that submitting a manuscript costs anywhere from 320 USD to 520 USD (it’s an open access journal, after all). This is not a trivial amount, but other open access journals have been known to charge a lot more – the more famous ones may take thousands of dollars per submission.

Publishing with open access is not without costs. Journal of Neoplasm defrays those costs from article-processing charges (APCs) payable by authors onces [sic] the manuscript has been accepted for publication.Insights in Biomedicine does not have subscription charges for its research content, believing instead that immediate, world-wide, barrier-free, open access to the full text of research articles is in the best interests of the scientific community.

Whoops! What the hell is “Journal of Neoplasm”? Sounds like a journal name to me. Either the two hold a shared bank account, or else we witness a copy-paste error of the type that causes your code to crash in the middle of the night.

Luckily, Journal of Neoplasm also has a website (http://neoplasm.imedpub.com).

Not-suspiciously, it looks exactly the same as Insights in Biomedicine. Same colours and fonts and everything. In fact, they also have a “current issue” tab. It holds only this:

Needless to say, the archive shows “No Volumes and issues availiable..!”.

Maybe it’s time to visit the source: the publishing company behind these journals, Insight Medical Publishing. (http://www.imedpub.com). Their homepage shows all the journals they have under their wing, as well as the holy grail: recent papers!
Finally, we see some actual, peer reviewed papers! The “recent articles” box on their site contains eight papers. Some of them are not in English (but that’s ok, we don’t judge by language). The latest of them is from 2015 (no specific date, but there is something about “volume 6”), while the last one on the list is from 2014. Interesting – in the eight most recent papers, some are from 2014. That’s it? Well, you can press a “view more” button, but can you guess where that leads? Yup:

Yes, they have two !! in there, and that’s legitimate for a respectable publication company.
Now, you might be led into thinking that this is odd, and that there aren’t any papers at all in this entire network of journals, but that would be wrong. Because this error message is indeed followed by a list of all their journals, of which there are over 200. Every one of these has its own identical-looking website, complete with a list of editors, information for authors, ethical malpractice information, and articles.
Some of these journals have an astounding number of editors. I picked one at random, on orthodontics and endodontics, and it had 18 editors, one of which is “Vincenzo Grassia, Professor of the master 2014 ‘Orthodontic therapy in adult patients’ of the Sun”. Another, Journal of Informatics and Data Mining, has about 23 editors. Unlike Insights in Biomedicine and Journal of Neoplasm, this one had some papers in the “articles” tab: two grand issues, each of which has about six papers. Overall, this journal has more listed editors (each with a doctorate!) than papers.

It seems like people have put a lot of work into these sites, and yet many of them are almost empty. It turns out that “Insight Medical Publishing” is redirected to “OMICS” in Wikipedia, and there it is stated, “According to a 2012 article in The Chronicle of Higher Education about 60 percent of the group’s 200 journals had never actually published anything.” The same Wikipedia entry also has all sorts of scary words, like “predatory publishing”, “cease-and-desist letter”, and “false claims of affiliation”.
Now, I could say a lot of bad things about giving open access journals a bad name and luring unsuspecting scientists and how the hell did they get my email and know I’m into science did they dredge all of Weizmann’s address lists?!, but I digress. I think for the present, I’ll skip the opportunity to publish with Journal of Insights in Biomedicine. Maybe I’ll return to it at a later time, after I finish my homework on Markov Models.

## Book Review: The Secret Life of Germs

September 29, 2015

They are on your hands. They are in your food. They are underneath the carpet. They are in your gut. They are in your butt. They colonize your teeth. They prowl the house while you sleep. They crawl on your skin at this very moment. Chances are, they will kill you. If they won’t, they will eat your decomposing body. They are what decomposes your body.
There is no escape. There is no hope. There is only death.
Enter at your own peril, for here lies only doom and woe: presenting “The Secret Life of Germs”, by Philip M. Tierno Jr!

Surprisingly, this is not a horror-thriller book – it’s popular science. In general, it describes different types of germs – an inclusive name for fungi, bacteria, viruses and any other microbial badasses which have you on the top of their kill list – how you may interact with them in everyday life, and how you can prevent them from eating you alive (seriously, I’m not making this up. Go ahead and count how many times the phrase “flesh-eating” appears in the text).
After reading this book, you will know when to wash your hands, and which types of germ invasion you are preventing when doing so. You will also learn guidelines for preparing food, bathroom layout, handling pets, taking hikes, and basically anything else that relates to personal cleanliness. In short: it teaches you elementary hygiene, something which of course everyone should know, but at a much deeper level than you are used to. Which is nice.
But it’s also fucking scary. If you take its content at immediate face value you will not be able to finish it, because you will be curled up into a little ball, whimpering in the corner of of a remote island while continually pouring bleach on yourself. This is because “the modern office is densely populated with objects that can harbor infectious germs”, “[dollar] bills are contaminated with germs of fecal, respiratory and skin origin”, “leaky vacuum cleaner kept resuspending Salmonella”, “the infant’s walker had a heavy growth of S. aureus”, and “The steering wheel was covered in beta hemolytic group A strep, which can cause strep throat or flesh-eating disease”.
See? Flesh-eating! These examples are from a pretty much random sample of 20 pages around the middle area, and they aren’t even the most frightening ones. How can anyone touch anything after reading this?
Of course, it could be that Tierno is a bit exaggerating, but his descriptions seem to me to be accurate enough and fit in with what I already know – that bacteria can live practically anywhere and in almost all conditions. I guess the real thing to be learned here is this: that the human immune system is one badass piece of machinery, which successfully deflects innumerable invasions, agressions, sieges, infiltrations and all out bombardments without us so much as flinching. I tend to get sick about once a year on average, usually with a seasonal cold. There’s an army of tiny flesh-eating soldiers out there just waiting to get me, but all I get is a couple of coughs and sneezes.
Still, Tierno describes a dangerous reality, and takes large precautions to avoid those dangers. If you follow his instructions, you will probably wash your hands about 100 times a day (and remember, effective hand-washing requires at least 20-30 seconds of soap-rubbing, including underneath the fingernails). You will change the clothes you wear to the movies, the food you eat, and the amount of times you pet the dog.
This book is therefore very irritating. The advice Tierno gives seems sound (at least if you really want to avoid germ contact; we won’t discuss “training your immune system”). It’s logical. It’s clear. It makes sense. But it’s also annoying – it involves being “picky”, “overly-hygienic”, and changing habits that I have acquired throughout my whole life. It will require increasing my hand-washing time by an order of magnitude. It will require being conscious of the horrible world of the germs at all times. It involves “not eating sushi” at all because there is a chance that uncooked fish carry a Vibrio germ. It requires checking for fleas and ticks on my groin and behind the ears every time I walk through the woods. In short – it presents a mild inconvenience. This, at the benefit of a potentially longer life span and less sickness. It adds another “worry” to your world.
I’m in conflict. The rational part of me says, “you would be a fool not to embrace this advice. If you don’t, you can forget about me helping you when you contract a new strain of SARS”. The lazy and doesn’t-want-to-be-disturbed part of me says, “I have lived all my life as I do now, and my surrounding and neighbourhood act as I do. We are generally alright; why live your life in worry?”

I guess the question is, “is the extra worry and ritual worth the expected benefit in your life expectation and comfort (due to less sickness)?” To each his own answer. But the good thing is, you can take as much as you want from the book, and leave the rest alone. While I will not embrace the full extent of its writing, it has definitely made me more aware of the general germliness of the world, and probably will affect my overall behavior. To all you germs – make much your time.

August 24, 2015

There’s a nicely morbid Perry Bible Fellowship comic showing a kid on his birthday:

That’s what you get with strict determinism: the day we die is already decided on the day we are born; on the day the universe was born, in fact. But since we ordinary mortals do not have access to Death’s all powerful computing machine that lets him calculate the end of all tales (an abacus?!), we have to use statistics. Specifically, we like to use life expectancy, which is basically the average age of death of a certain population at a certain time.
Calculating it is rather easy for times far away in the past. What was the life expectancy for infants at 1900? Just look at all the people who were born in 1900, and take the average of their life spans. For modern times it’s a bit more complicated, since there are still so many yucky living people who spoil your statistics, but we get good results under reasonable assumptions.

Here’s a cool thing about life: if you haven’t died so far (and, as you are reading this, I assume you haven’t), you will statistically outlive the average baby born on the same year as you. In other words, the older you are, the older you die. This is pretty obvious, but it’s a nice pat on the back as it shows your accomplishments in the field of “not dying”. Are you 30? Congratulations! You should no longer be afraid of dying from chicken pox, drowning in a bucket, baby measles and child-malnutrition; and your chances of going to war are severely reduced. So you have to take out all those deaths out of the equation, and the net result is that your projected age goes up.

But there is a contrasting force in this whole ordeal: the older you are, the less time you have to live – because you’ve already lived some portion of your life. The life expectancy of an infant born in the United States in 2013 is about 78 years. She has 78 years ahead of her. The life expectancy of a 90 year old in the United States in 2013 is 94. She will die older (she has already passed the 78 year mark), but only has 4 years ahead of her.

This leads to the question: what is “the best age to be alive”, in the sense that “your whole life ahead of you” is the longest period of time? This depends crucially on the mortality statistics: we can imagine that in a world where most children don’t make it to age 10, people who are 30 will have more to look forward to than five year olds.

In fact, this is what happened in Massachusetts during the 1850’s (data found here):

In orange, we see how many years a person has left to live as a function of age in Massachusetts in 1850. We see that a baby just born has roughly 40 years ahead of her, while a child who made it to age 10 has about 47 years ahead of her! In the 1850’s during childhood, every year you grow older actually increases your life expectancy! In view of the comic at the beginning of the post, for these children, every time they celebrate their birthday, Death should add a bead to their life total (statistically speaking, of course; deterministically we are all doomed anyway). Newborn babies and 20-year olds can expect to live the same amount of time.
The yellow line plots the function y = x. The place where it crosses the orange line is the “half-life” point – the age where you have put as much time behind you as you have in front of you.
Finally, in blue, we see the average age of death. It continuously rises as a function of the age of the person in question. Notice that it is always above the yellow line y = x: This just means that when you look at a specific living person, that person will die older than she is right now. Unfortunately the data I have only goes up to 80 years, but eventually the blue line and the yellow line should coincide, at the age of the oldest person who ever lived.

But that was way back a hundred and fifty years ago. With the advancement of modern medicine, food, and infrastructure, are things any different now? Indeed they are. Here is the United States data for 2013 (data found here):

First, life expectancy went up; that’s a no brainer. But what’s interesting is that now the orange line no longer has a peak in the beginning. Further, the blue curve stays almost flat for the first ~40 years or so of life: The difference in the expected age of someone who is 40 and someone who was just born is a mere 4 years; contrast this to the whopping 29 years in 1850.
We have so completely decimated child and youth mortality, that it no longer “pays off” to grow older, in terms of gains in life expectancy. Looking at the data shows that this phenomenon – the lack of orange peak – started in the 1920’s, at least in the United States.
So the Death and the Abacus comic is indeed relevant – but only for the modern era. In the past, getting to 10 would have warranted a real celebration – for that is the age where “you’ve got your whole life ahead of you” carries the most weight. But today? The countdown starts with your first gasping scream for air.

## A primitive model for genetic recombination

August 17, 2015

Introduction:
I’m taking a class in general genetics at the Technion, and there we learned about genetic recombination, and in particular, homologous chromosome crossover: a phenomenon where chromosomes exchange sections between themselves during meiosis.
When this happens, some of the members of the population exhibit “recombinant genomes”, which are different than their parent genomes should supposedly generate. Surprisingly, this part of the population never exceeds 50%, even though at first look it seems as if it could.

In this post, we’ll see a model of chromosomal crossover statistics that explains this phenomenon, as well as giving an estimate to the physical distance between genes as a function of population statistics. I’ll assume you know some basic genetic terms, such as “dominant” and “heterozygote”, but I’ll explain about crossovers in general and describe the problem in more detail below. You can skip directly to “The model basics” if you already know about recombination.
The post will be about 50% biology and 50% math.

Biological background:
We’ll work through an example with the commonly used traits of EYE COLOR and WING-SIZE in fruit flies. Both are controlled by single genes found on the X chromosome.
A fly’s eyes can be either red or white, with red being a dominant quality. We’ll mark the dominant gene with R and the recessive gene with r. Thus, if a fly’s genome contains Rr or RR, it will have red eyes; otherwise, if it contains rr, it will have white eyes.
Similarly, a fly’s wings can be either long or short, with long being a dominant quality. We’ll mark the dominant gene with W, and the recessive with w, so long winged flies have Ww or WW, and short winged flies have ww as their genotype.

Suppose we have a heterozygote cis female. In other words, her genome contains both the dominant and the recessive genes (so she has RrWw in her genome), and both of the dominant genes are found on the same homologous chromosome. In other words, her two X chromosomes look like this:

During meiosis, her two homologous chromosomes duplicate and then separate, and we get two types of possible germ cells: RW and rw:

However, it is also possible for crossover to occur: two chromatids are “sliced” at some point, and then the two parts of each are glued to each other.

If this happens during meiosis, the outcome is four possible germ cells: RW, Rw, rW, rw:

Now, what happens when we mate our RrWw female with a white eyed, short winged male? Since these traits are found on the X chromosome, and a male fly only has one of those, he necessarily has the recessive allele, rw. We don’t care about the Y chromosome here.

Upon mating, the male fly will give the offspring either an X or a Y chromosome. Let’s ignore the males at this point, and focus just on the females. Since our male’s genotype is rw, we will get the following combinations: RrWw, rrww, Rrww, rrWw. All of these are phenotypically different, and each represents a different combination of red/white eye and long/short wing. The Rrww and rrWw genotypes are recombinant – they only exist in the population because of recombination.

Suppose now that the chance for recombination between R and W is some number q between 0 and 1. Then if we look at a very large collection of germ cells from the mother, we expect the following distribution:

RW should be $\frac{1}{2}(1-q)$ of the germ cell pool
rw should be $\frac{1}{2}(1-q)$ of the germ cell pool
Rw should be $\frac{1}{2}q$ of the germ cell pool
rW should be $\frac{1}{2}q$ of the germ cell pool

This is because q of the population should be recombinant, and whenever there is recombination we get an equal amount of Rw and rW.
After mating, when looking at the females, we only need to add the father’s recessive genes, and we get:

RrWw should be $\frac{1}{2}(1-q)$ of the population
rrww should be $\frac{1}{2}(1-q)$ of the population
Rrww should be $\frac{1}{2}q$ of the population
rrWw should be $\frac{1}{2}q$ of the population

Thus, Rrww and rrWw comprise $\frac{1}{2}q+\frac{1}{2}q = q$ of the population. This can be measured in real experimental trials, since each of the above genotypes translates into a different observable phenotype.
At this point in our theory, q can be any number between 0 and 1. If q is 0, then there is never any recombination, and the two genotypes RW and rw go hand in hand forever. If q is 1, then recombination always happens.
However, it is an empirical fact that the percentage of recombinant population is never more than 50%! The measured value of q is always less than or equal to 0.5.

There must be some mechanism that prevents recombination from happening too often. We can make appeals as to the utility of this mechanism, and wonder whether it is good or bad to have a small number or a large number of recombinations between genes – but for now, let’s try to think of an underlying model.

Image source: wikipedia

The model basics:
We treat the chromosome as a linear piece of DNA, with a length of “1 chromosome” – in essence, it is a line segment of length 1. The different genes are points on this line, and are therefore assigned a position 0pos1. In reality genes have some finite width on the DNA strands so a more accurate model will treat them as small intervals, but it will be easier to consider them as points.
We’ll assume that the gene that codes for eye color is on the left of the gene that codes for wing size. Denoting the position of the first by x and the second by y, we have this schematic for our chromosome:

The primary element in our model is the crossover event, or a cut. In this event, two homologous chromosomes are cut at a random place, distributed uniformly across its entire length. The chromosomes then swap strands at this position.

There are two options here. If the cut occurs in in interval between x and y, the genes will be swapped, and we have recombination. However, if the cut occurs outside the interval [x,y], then those two genes will not be affected. Since the cut distribution is uniform, the chance to land between the two genes is just y-x, so the probability of recombination is $q = y - x$.

This is a simple operation, and it’s tempting to think that it is the entire deal, but this is not so. In a crossover event, if two genes are far away from each other, meaning, at the opposite sides of the chromosome, then the probability of recombination can be very close to 1: nearly every cut we make will separate them. But we never observe a q above 0.5! There is obviously something more that we are missing here.

Image source: Science magazine

The answer: the above description is true only for a single crossover event – a single cut. However, there is no guarantee that a chromosome will undergo any crossovers at all during meiosis. Further, a chromosome may actually undergo several crossover events, as was experimentally discovered when looking at the recombination relations between a triplet of genes on the same chromosome. But look what happens when there are two crossover events in the same interval [x,y]: the strands are switched twice, and ultimately there is no recombination between the two genes!

We can now convince ourselves: whether or not we see recombination between two genes depends on the parity of the number of crossover events that occurred between them. When looking at the population statistics, what we ultimately see is the average of the parity of crossovers.
As an artificial example, suppose that during meiosis, there is a 50% chance of performing a single cut, and a 50% chance of not performing any cuts at all. In that case, for two far away genes, which are always separated by any cut, there is a 50% chance of getting recombination, and 50% chance of not getting it. In other words, q was reduced from 1 to 0.5. In general, in this case the observed probability of getting recombination is $q = \frac{1}{2}(y-x)$, as half the time we do not get a recombination at all.
Of course, there is no reason to assume that there is a 50% chance of getting no crossover event, and 50% of getting exactly one – the number of crossovers could behave in different ways – but we see that the actual percentage of recombinant population depends on the distribution of the number of crossover events in the chromosome. Which distribution should we choose?

A slightly flawed offer:
A simple choice would be a binomial distribution. The reasoning goes as follows: during meiosis, there are all sorts of enzymes floating about the chromosomes, which are responsible for cutting them up and gluing them back together. There may be a large number n of these enzymes floating about, but they only have a certain probability p of actually performing their duty. Of course, we assume that they act independently, even though in reality they may interfere with each other. So the number of crossovers depends on the numbers of “successes”, where a success is an enzyme doing its work properly, which happens with probability p. This means that the number of cuts distributes according to $C \sim Bin(n,p)$.

So assuming the number of crossover events distributes according to $C \sim Bin(n,p)$, what is the probability of getting an odd number of crossovers? Let’s take a moment to calculate it.

For any $n$, denote that probability by $P_n$. Suppose you already checked $n-1$ of the enzymes. Then with probability $P_{n-1}$, you already have an odd number of crossovers, so you don’t need any more of them. Further, with probability $1-P_{n-1}$, you have an even number, and you want another crossover to get an odd number. So the probability obeys the recurrence relation

$P_n = P_{n-1}(1-p)+(1-P_{n-1})p.$

with the initial condition that $P_0=0$, as if there are zero enzymes there are zero crossovers, which is an even number.
More nicely:

$P_n = P_{n-1}(1-2p)+p$

$P_0 = 0.$

If we look at just this equation:

$P_n = P_{n-1}(1-2p)$

we quickly see that the answer is $P_n= a \cdot (1-2p)^n$. However, we also have that additive +p in our original equation. It turns out we only need a small adjustment in order to compensate it though, and in this case we just have to add an extra constant, so that

$P_n = a \cdot (1-2p)^n + c.$

Since the equation is linear, this is actually very much like the particular solution of a differential equation, and we can find c directly by putting it into $P_n$ in the recurrence relation:

$c = c (1-2p) + p,$

which gives

$c = \frac{1}{2}.$

Taking into consideration the initial condition, the solution is then,

$P_n = \frac{1}{2} - \frac{1}{2}(1-2p)^n$

Wonderful! For very large n, the probability of getting an odd number of crossovers goes to 0.5! Even for relatively low probabilities p, the quantity $(1-2p)^n$ goes to 0 very quickly.

This gives an answer regarding two genes which are very far away: they are affected by every cut performed by the enzymes, and so their recombination probability is exactly the same as the probability for getting an odd number of cuts. But what about genes which are closer? For them we actually have to take into consideration the fact that not every cut the enzymes make will cause a crossover.
Notice the following: the number of cuts in every chromosome is distributed binomially, $C \sim Bin(n,p)$. If we already know the number of cuts to perform – say, k – then the number of cuts which affect the two genes at positions x and y is also distributed binomially as $Bin(k,y-x)$, since every cut has a probability of y-x of crossing the two genes. So the number of crossovers G between y and x, conditioned that $C = k$, is $Bin(k,y-x)$, and k itself distributes as $B(n,p)$.
Now comes the cool part: there is a theorem about binomial distributions which says the following: if X is a random variable that distributes binomially, $X \sim Bin(n,p)$, and Y is a random variable that conditioned on X distributes binomially, $Y|X = Bin(X,q)$, then Y is also binomial, $Y \sim Bin(n, pq)$! Using this theorem, the number of cuts S which swap between x and y goes as $S \sim Bin(n, p \cdot (y-x))$.
Now we can apply the same reasoning as before, only this time, a “success event” is not merely when the enzymes perform a crossover anywhere on the chromosome, but rather when they perform it in some place between x and y.
The final probability of getting recombination between two genes is then

$q = \frac{1}{2} - \frac{1}{2}(1-2p(y-x))^n$

This is very nice, and it gives us some asymptotics as well. For large values of $p(y-x)$, the second factor is negligible, and we have $q =\frac{1}{2}$. For small values of $p(y-x)$, the second factor can be expanded to first order, and the two $\frac{1}{2}$’s will cancel each other out, giving us $q \propto (y-x)$.

Slightly improving the binomial:
Overall, the model proves adequate in its predictions, and its simplicity is alluring. However, it is not without problems. For example, its two parameters – p and n – must somehow be found out, and it is not entirely clear how to do so. In fact, the very fact that we have a fixed n here seems out of place: by keeping it around, we assume that there is a constant number of enzymes working about, when it is much more reasonable that number varies from cell to cell. After all, when manufacturing hundreds or thousands of enzymes, there must be variation in the numbers.

Luckily, there is a simple way to fix this, which is actually firmly based on reality. Instead of assuming that the number of cuts the enzymes make is distributed binomially, we assume it follows a Poisson distribution, $C \sim Pois(\lambda)$, for a yet unknown $\lambda$. This actually makes a lot of sense when we remember that Poisson distributions are used in in real life to describe queues and manufacturing processes, when what we know is the average time it takes to perform a single event.
If the number of overall cuts has a Poisson distribution, how does the number of crossovers between x and y behave? Well, given that the number of cuts is k, the number of crossovers is still as before, $Bin(k, y-x)$. But again the theorems of probability smile upon us, and there is a theorem stating that if $C \sim Pois(\lambda)$ and conditioned on $C = k$ we have $S|C \sim Bin(C,y-x)$, then

$S \sim Pois(\lambda(y-x)).$

So the distribution of crossovers between x and y will also follow a Poisson distribution!
Now we only have to remember the simple trick, that

$Pois(\lambda)= \lim_{n \rightarrow \infty} Bin(n,\frac{\lambda}{n}).$

Thus, under the assumption of a Poisson distribution, the final probability of getting recombination between two genes is

$q = \frac{1}{2} - \lim_{n \rightarrow \infty} \frac{1}{2}(1-\frac{2 \lambda(y-x)}{n})^{\frac{1}{n}},$

or, more simply,

$q = \frac{1}{2} - \frac{1}{2}(1-e^{-2 \lambda (y-x)}).$

This again has the same desirable properties as before, but the model is simpler: we got rid of the annoying n parameter, and the probability parameter p was replaced by the rate parameter $\lambda$.
(Note: For small values of (y-x), the probability for recombination is $q=\lambda(y-x)$; if only we could set $\lambda = 1$ and get a direct relationship between q and the distance between genes…)

To conclude:
The percentage of recombinant phenotypes in a population of offspring is always smaller than 50%. This is not an arbitrary number, but stems from the underlying biological mechanism of recombination. Because multiple crossover events can occur between two genes, what’s ultimately important is the parity of the number of such events. When the total number of crossovers in a chromosome follows a Poisson distribution, the parity can be readily computed. It behaves like a fair coin toss for genes which are physically far away from each other, but is linear in the distance between the genes when they are close to each other.
This “Poisson crossover model” is very simple, and of course does not explain all there is about recombination (genes are not points on a line; distribution is probably not Poisson; events are not independent; there are “recombination hotspots” in chromosomes; the chromosome is a messy tangle, not all of which is accessible; etc). But it looks like a good starting point, and to me seems adequate at explaining the basic behaviour of recombination.

## Book review: Imperial Earth

August 8, 2015

By our beloved free encyclopedia, “Hard science fiction is a category of science fiction characterized by an emphasis on scientific accuracy or technical detail, or on both.”
When I (discreetly) mentioned to my friend that I might be interested in that sort of thing, he suggested reading Arthur C. Clarke’s “Rendezvous with Rama”. I did, and found it glorious. But perhaps we should mention another book by Clarke as a runner up for the distinguished merit award in that category. Here is an excerpt, recounting the protagonist’s experience at a fancy party:

Yes folks, your eyes deceive you not. As for me, I was rather lucky – for just half a year ago I took the third-year, fifth-semester course in solid state physics in which students learn about the Drude-Sommerfeld model for electron conduction in lattice-based systems – so I was able to calculate Duncan’s mean free path and draw a mental picture of his quantum scattering.
Intrigued? Want to read a sci-fi political novel while simultaneously taking a crash course in radio astronomy and polyominoes? Say no more! I present to you, Imperial Earth!

There is an entire chapter on pentominoes – a cool math puzzle that involves fitting twelve tetris-like pieces together to form various shapes. No ink is spared when describing the combinatorial explosion problem and the comparison between brute force computing and human creativity.
There are several long sections describing the difficulties of picking up long wavelengths on Earth, and how mankind can cope with them. There are considerations of the harsh reality of space travel and communication. There are even numerous paragraphs on interior design (of underground accommodation facilities on Saturn’s moon Titan, of course).
Of course, I exaggerate a bit. There’s still a plot, some character development, and a bit of mysterious schemes. But the reader who expects too much in this area will come out disappointed – these are not the book’s strong points. Rather, they serve mostly as an excuse for Clarke to show off and preach some of his thoughts and ideas about future society.
And this indeed is how you should approach the book: as a collection of ideas. Duncan travels through through a world where we have started colonizing the planets; this takes both time and technology, and Clarke fills in the gaps as he wills. Some ideas are textbook standards, like a unified world government that (somehow) manages to hold the peace on Earth. Others are more amusing with roots in the Classics, such as having the US president be picked at random from a pool of potential candidates: one of the required qualifications is not wanting to be president, and successful presidents get “time off for good behaviour”. And of course, there are more unique ones, such as Earthmen’s obsession for preservation of the past, despite (or perhaps because of) the marvellous technological advances.
It is true, some of the themes and events are underdeveloped, and the book cannot go down every branching path it presents. Various holes scatter the landscape, both in the overall plot and in the little details that make the world go tick: it is nice to have US presidents who are picked at random, but how this system stays in faithful unbiased hands is left for us to wonder.
But perhaps that’s a good thing.

## Our beautiful

August 3, 2015

I recently got back from a sixteen day vacation in Croatia. Croatia is a beautiful country, renowned for its lush forests, crystal lakes, stunning mountains, and dreamy coast. The natural thing to do, therefore, is to stay cooped up for almost two weeks in the Gymnasium of the small town of Požega, and teach three high schoolers about laser displays.

Sounds a little similar to last time, I know. But this time, unlike the last, I actually got to see some of those lush forests, crystal lakes, stunning mountains, and dreamy coast. Indeed, after the hectic Summer School of Science, I finally managed to get a road trip in Our Beautiful Homeland. Here are some photos; a more technical description of the school might appear later.

It all started in Zagreb. Cloudy with a chance of sweat and Egyptians.

Zagreb’s main square is truly a wonderplace. Not a day goes by without some artistic or cultural event. I have at least one more post I want to write about some interesting cultural phenomenon I saw there. But that will wait for another time, as the Summer School of Science was just around the temporal corner.

After a 7 half-hour bus ride to Požega, S3 started. I will not go into detail here, just post pretty results: the students built laser scanners using speakers, and used them to generate Lissajous curves (and more!).

After the camp there followed several more half-hours of bus riding, followed by even more bus riding. Overall, buses are a good place to be. Some of them even have wifi.
But all that sitting quenched inside various means of public transportation certainly paid off. For now, in prophetic order, we have our sightseeing. Lo and behold the prophetic order!
Lush forests and crystal lakes:

Stunning mountains:

Dreamy coast:

Ah, Croatia, you are as mysterious as you are beautiful. It’s almost needless to say, you truly have captured me.

## Book review: The Andromeda Strain

July 9, 2015

*** spoiler included ***

Do you know that feeling where you say to yourself “I’ll just check out this article on wikipedia” and then five hours later when you finally raise your head from the screen and gasp for air after having dredged half the the internet you cannot help but wonder “where the hell did seven hours go?!”?
That is “The Andromeda Strain” by Michael Crichton, for the better and worse of that statement. On the one hand, it’s a page turner; you’ll have to take care not to tear the pages as you blaze through them faster than the speed of sound. On the other hand, at the end, you’ll sort of want those seven hours back.

Crichton wrote a very engrossing and thrilling book. Merely the basic premise – an emergency team handling the outbreak of an alien microbe – commands us to think for a moment how complicated indeed First Contact would be with any extraterrestrial race. This is an interesting and thought provoking topic, and science fiction is undoubtedly filled with contact books, speculating on an entire range of scenarios. Meeting with a disease is an original one, that seems obvious in hindsight in a satisfying kind of way. The possibility of an alien pandemic that threatens Earth’s entire population with near instant death or insanity is definitely page-turning material. The book also imitates the form of a classified report, and this adds realism to the sense of what-will-happen-next exhilaration.
It’s too bad that as you turn the final pages, exhilaration turns to disappointment, and all the pent-up tension, all the built-up potential energy dissipate to nothing. The disease simply and spontaneously “turns dormant”. Nothing happens to major American cities. Millions of lives are not compromised. No megapolis is evacuated. The scientists working on the project did not save the day; in fact, disaster was averted only because their advice was ignored. This, despite the fact that book constantly warns you, “and then the scientists made their second, crucial mistake”.
“Ok”, you might say, “so the book focuses mainly on the investigation going on in the research laboratory, instead of the dangers of the outside world. How is that so different from Rendezvous with Rama?” Well, there are at least two differences.
First, in Rama the exploration is so obscenely fascinating, and mankind’s technology is so pitifully crude compared to the Raman’s, that the team’s feeble attempts just multiply the overwhelming awe conjured by the book. The whole point of Rama was exploration. By contrast, The Andromeda Strain sells itself as one about preventing a major disaster, but doesn’t finally deliver on that front. The scientific investigation is exciting, but is not sustainable on its own without the knowledge that failure to contain the outbreak will have catastrophic consequences.
Second, there’s a lot of science in this book, but it falls short of convincing the slightly trained eye. While some of the methods used by the investigation team are Freakin’ Cool, the scientists sometimes perform tests in a manner so sloppy you would expect better from a freshman undergraduate. A freshman majoring in History, mind you. And the book uses “evolution of microbes” in such a grossly wrong way, it would have been better off to just blame it all on “the almighty hand of creationism”.

But do not take this review too harshly, dear reader. If disappointment is the main flaw of this piece, then the real problem is just the expectation. Give this book a try! You’ll read it quickly enough. Just keep in mind that today is not the day for a post-apocalyptic Andromeda-quarantined America.