Intelligence is inexplicable

April 8, 2014

So, the question still remains – after spending endless nights in front of a bright screen, surrounded by empty pizza boxes and the soft hum of cooling fans – when does your program finally count as “intelligent”? It may have started out as a simple “hello_world.c”, but now it’s a raging monstrosity, fully imbued with seven layered neural networks, evolutionary algorithms and a taste for human blood. Doesn’t that make it an artificial intelligence, even if it can’t pass the Turing test? (To be honest, I know many real-life people who wouldn’t pass the Turing test themselves).
Yes, many papers have been written on this, and the walls of a myriad rooms are covered with Eastasian symbols, but still, it will do no harm to spew out my own two cents. I guess I could search the relevant literature first, but there’s no fun in that, is there?

tl;dr? The main points, summarized:

- An intelligent machine must be able to solve problems it has never encountered before.
- If a machine can be reduced to an algorithm which humans understand, it is not intelligent.

These are necessary, but perhaps not sufficient, conditions.

Since this is just a definition, and we won’t do anything rigorous with it, I can afford to simply look at some specific cases and use inductive arguments in rashful abundance.

We’ll start with an example. Consider the task of sorting numbers: we have a list of numbers, and we want to arrange them from smallest to largest.
I think it’s safe to agree that a preprogrammed computer running “heapsort” would not be considered intelligent. After all, it’s just using a predefined algorithm, one which was mathematically proven to work, line after line. There’s no intelligence in that. It’s a simple mechanical process, with a sole and not-very-surprising outcome.
But give a child (i.e – anyone with no knowledge of heapsort) a stack of cards and ask her to sort them, and she’ll eventually succeed. Perhaps she won’t do it in the most efficient way possible, but let’s look at the circumstances – she was not preprogrammed with a built-in sorting algorithm. All she had to start with was the goal – “I’ll give you some chocolate if you sort these cards for me” – and eventually she found a way to solve the problem – perhaps by gathering on her previous experiences, or perhaps by some Divine Inspiration – who am I to judge?

I say then, that an intelligent being is one which is not designed to specifically solve one type of task – except, perhaps, for the task of solving other tasks. An artificial intelligence must be capable of solving problems it has never encountered before. In that sense, in my opinion, programs whose sole purpose is to pass the Turing test should not be considered intelligent (then again, I don’t think that the Turing test is a definitive way to decide artificial intelligence in any case).
You might get a bit angry and interject, “but a Turing-test solving-machine may encounter questions and dialogues that it was never pre-programmed to answer!” To this I say, that such a machine is closer to a linear-equation solver, which, much to its delight, was just given as input a set of coefficients which it was never given before. I think solving the Turing test is much closer to that side of the spectrum, rather than to the “solve types of problems it never saw before”. You might say that a general problem solver is the same, only with problems as inputs and not coefficients, but there is a vast difference between “solve the problem of talking to humans” and “solve the problem of solving problems”.

One thing I do take from the Turing test, is the fact that it puts a main emphasis on humans. We don’t really work hard on creating a machine that is indistinguishable from a dog, or perhaps, an ant. I bet most people are willing to crush ants, but are less willing to crush dogs, partly for the reason that ants are much more mechanistic than dogs. But humans, now, those are a different matter. How does a human work? What inner mechanisms drive thought forward? How can it be that a set of neurons firing can lead to differentiation of continuum and discrete infinities? We don’t know, and hereby lies our intelligence.
To put it plainly: if we have an algorithm for solving new problems, and we understand how that algorithm works (perhaps, by some miraculous wonder, we also proved that it is correct!), then a machine implementing that algorithm is not intelligent. How would it be any different than the machine implementing heapsort?
Now, eventually, all machines (both silicon and carbon based) can be reduced to transistors opening and closing / if-else clauses / neurons firing, so we have to be careful here – reducing a machine to mechanical actions is not enough to deprive it of its intelligence – indeed, I do not claim that there is some non-physical or non-algorithmic process involved, or that our intelligence is found in some sort of “soul” or whatnot. Rather, the keyphrase here, is “we understand how that algorithm works”. To put it in other words, intelligence is subjective.

Consider our favorite pastime learning technique of artificial neural networks. We create a large batch of randomly connected neurons, and give them zounds of data sets to learn. At the end, the weights between the neurons are such, that the network is capable of solving the specified problem rather well.
But given the network, we cannot (well, at least I cannot) say that we understand why it works. It’s just the way the weights, the reverse-feedbacks, the interconnected relations turn out. Hell, we even started with a random network! We have programmed the network using our design; we understand the learning algorithm; but the network itself? Given the connections and weights, it’s very hard to say why these particular values work; it’s also hard to say why these are the ones that we got, and not any other ones.

I am aware that there is a not-so-comfortable disadvantage in this argument, stemming from its conclusion. If ever we were to reduce, in totality, the human brain to a series of provable algorithmic steps – we would be forced by this criterion to revoke our own status of intelligence. That would totally suck for our collective ego. Also, if intelligence is subjective – if ever there was an alien race which could comprehend our brain – they would consider us mindless automata, while we currently consider ourselves the prime of all Creation. Humbling, isn’t it?

The conditions I give are necessary, but I would not hasten to call them sufficient. If you merely obfuscate your algorithm, making it incomprehensible, that wouldn’t make the machine running the code any smarter. Likewise, solving new problems by brute forcing through a gargantuan configuration space does not grant further IQ points.
If you show me a neural network program which successfully solves a large set of new unencountered problems, I’ll probably agree on calling it intelligent. But most things less than that are either too narrow, or too comprehensible.
What good is this definition? I don’t know, what good is any definition of artificial intelligence? We are still far away from machine ethics, and until then, such definitions have no practical use. Perhaps it would deem to be useful, in case we ever prove, mathematically, that the brain contains inherently unprovable algorithms. But that’ll take a while.

A short note on large numbers

March 4, 2014

It is not very often that we encounter large numbers in mathematics; in fact, in some fields it’s sometimes quite common not to encounter numbers at all. Most numbers you meet are relatively benign and harmless: 0, 1, 2, 3, e, π; you don’t need much apart from those. In any case, even to an aspiring toddler, they are not large.
But sometimes there naturally appears a number that is obscenely large. Well, it depends on what you mean by naturally, I suppose, for there are so many possible questions to ask, some of them must give large results. But as far as I have seen from my studies, usually when you ask nice questions, you get nice answers (well, if you get answers at all). As an example, the infinite sum \sum_{n=1}^{\infty} \frac{1}{n} diverges, but the infinite sum \sum_{n=1}^{\infty} \frac{1}{n^2} converges to \frac{\pi^2}{6}. Somewhere along the way, you can find a value of α such that the sum \sum_{n=1}^{\infty} \frac{1}{n^{\alpha}} converges to any number you want, but that would be picking a rather specific target, and the problem sounds contrived. Asking about n and n2 seems very natural, though.
But today I ran into the prime counting function – π(x) – which counts how many primes there are from 1 to x. It’s not that easy to calculate, but we can use approximations. A relatively-ok one is called li, for logarithmic integral, defined as

li(x) = \int_{0}^{x} \dfrac{dt}{\ln t}.

It’s not an entirely bad approximation: here you can see the number of primes, and the li(x) function in the same graph:


It may seem by looking at the data that li(x) is always larger than π(x), but this is not the case. In fact, the quantity li(x) – π(x) changes sign infinitely often (proven by Littlewood). At what value does the first sign change happen? Meaning, what is the first x such that that π(x) > li(x)?
We don’t know, and whenever we don’t know the exact answer, we try to bound it from above and from below. Lower bounds are found by computing both functions for increasing x values, while upper bounds require theoretical results.
Currently, we know that the first sign change happens at

10^{14} < x < e^{727.95135}

I find this surprising.
First, the current upper bound is enormously large. There is always room to wonder when we encounter numbers in proofs that are larger than the number of atoms in the universe. It may be just an artifact of the proof, the real number may be much smaller, but for me, it certainly causes eyebrow movement.
Second, the lower bound isn’t miniscule either. For a phenomenon that happens infinitely often, it sure is taking its time. Many interesting prime properties can be found early on in the prime sequence (I suppose that’s how we bothered looking for them in the first place), but this one can not. You may say, of course, that the question isn’t that natural – after all, li(x) is just another approximation function – but as an approximation, its relative simplicity and straightforwardness are quite attractive.
Primes are rooted deep in number theory; if we wish to understand mathematics, we must understand primes.
And this causes me to ponder:
Mathematics does not care about our lovely intuition of what constitutes a “big number”. It laughs in the face of the adorable humans, trying to understand the world within their own references and scales. There is an infinite amount of natural numbers. Period. There is an infinite amount of primes. Period. Name a number as large as you like, but once you name it, know that it is insignificant and tiny when compared to the vastness and infinity of those that follow it. It is hard to comprehend just how many natural numbers there are; perhaps the real surprise should be the sheer amount of results we have that do not involve incredibly large numbers.
Don’t you find that reassuring?

The Perfect Match

February 14, 2014

Valentine’s day is upon us, a cheerful reminder that the main purpose of every living creature is to copulate as much as possible and spray the world with its offspring; the more, the merrier.
This is certainly easy enough for all those twice blessed prokaryotes; twice blessed, first, for their asexual reproduction via mitosis, obliviating the need for a mate; second for their lack of consciousness, avoiding the misery induced by the absence of aforementioned mate.
However, for the rest of us, finding that special someone is tricky business; it’s a game of both chance and skill, bluffing and honesty, combinatorics and probability (well, if you’re into that kind of stuff). It’s no wonder that world is filled with attempts to make the whole process easier: dating sites, religious / traditional matchmaking, random-blind-date-generators, etc. But in the end, it all comes down to basic mathematics.

Let’s look at some of the well known algorithms and theorems for matching couples under different conditions. In quite stereotypical behaviour, mathematicians always assume that it’s possible to rank others by order of preference in a unique and unambiguous way, or that you can reject a suitor at point blank range without remorse, stating that “Sorry Billy, Roger is 0.2 [pretties] more handsome than you”, only to dump Roger two days later because the next person in the list comes along.
Oh well. Mathematics, thou art a harsh mistress.

Hall’s Marriage Theorem:
Our first theorem discusses whether it’s possible to get everyone married in a group of n men and n women, assuming that not everyone is willing to marry everyone else (sounds reasonable). The procedure is this: for each man and woman, ask them if they see themselves as a married, middle-age couple. If so, then they are potential wedlock candidates; otherwise, you won’t pair them up. Equivalently, you could ask every man to make a list of the women he would be ok marrying with, then have the women look at the lists and cross themselves out if they don’t agree.
The result can be given as a bipartite graph:


Each of the upper nodes represents a woman, each of the bottom nodes represents a man. An edge means that both are ok with an eternal life together. It’s quite possible that a very attractive, high quality person will have many edges, meaning, a lot of people are interested in marrying him/her, while other people might have little or no edges at all. In the above example, Lisa is very popular – everyone wants to marry her, and she agrees to marry with everyone – while Fae only has one possible mate. Thus, some people have more options than others, and the question is, can we assign couples, meaning, pick edges, so that *everyone* gets married in the end?
A simple case where the answer is a straightforward yes is:


Each man is connected to one woman, and vice-versa – meaning that the couples are already picked here, and we have no real choice.
A case where the answer is no is:


You can try picking out couples and seeing that a solution indeed does not exist.
Hall’s marriage theorem states that a complete matching exists, meaning, everyone can be paired, if and only if the following condition applies: look at all the possible subsets of men (i.e first look at all the men individually, then at all the pairs of men, then at all the triplets, etc.). For each subset M, count the number of different women that they can marry, G(M). Then |M| <= G(M).

We can see that the condition applies in the first case – for each man in the set, there is an appropriate woman, and thus M = G(M), and we have a matching. In the second case, however, the condition does not hold: look at Joe, Frank, and Mark: the three of them only have two potential mates: Jill and Lisa. The theorem does not hold, and indeed, there is no matching.
A proof of this theorem (actually a slightly more generalized one) can be found in wikipedia. Notice that the theorem doesn’t tell you at all how to find the matching, only that it exists.

Happy Marriage: The Hungarian Method:
Suppose now that our n bachelors and bachelorettes have had enough of the single life, and they decide that everyone should get married, no matter what. So instead of asking every pair whether or not they agree to marry each other, we ask them to rank the match. The result is a number for each (man, woman) pair; the higher it is, the more miserable they will be with their marriage. We want to find a matching between all men and women, such that the overall happiness is highest, meaning that the sum over all rankings of the chosen pairs is the lowest it can be.
A way of looking at this is the following: create a n x n matrix, where the (i,j) entry contains the ranking of matching the i-th man with the j-th woman. An example might be as follows:


So a marriage between Frank and Anne would be a happy one, while a marriage between Joe and Jill would be rather miserable in its own way.
The question reduces to this: we need to choose n squares overall, each one in a different row and a different column, so that the sum of all the chosen squares is minimal.
The simplest way to solving this is the brute force approach: just go over all the possible matchings, and pick the best one. But when there are n men and n women, there are n! different possibilities. Try doing this for 30 men and women; tell me about it when the universe ends.
Something more clever is needed, and indeed, there exists an algorithm called the Hungarian Algorithm which can solve it in O(n3) moves. In general, it relies on the fact that you can add or subtract any number from an entire row or an entire column of the matrix, and this will not change the optimal matching. Think about it this way: if you add a number to an entire row, the total sum will change, but you are going to have to pick an entry in that row anyway, so whichever matching you choose, you will always get a larger sum.
The aim of the Hungarian algorithm is to add and subtract values from the different rows and columns in a clever way, so that the values in the matrix are always non-negative. Eventually, as if by magic mathematics, there appears an arrangement of zeros that can be picked as a matching. The algorithm is not simple, and will not be given here; conveniently, it can be found at this nice site (in the context of assigning jobs to workers, but hey, once you get to a matrix formulation, the mathematics is all the same).

Stable Marriages:
The previous algorithm gave us a global maximum for happy marriages – when we summed the total “miserableness” of the matching, it was the lowest possible between all possible matchings. However, it may be that one couple was a very bad match, and they live grumply and miserably, but in the overall matching, its still the minimum. We will now ask the question – can we make everyone happy locally?
Of course, we need to ask what that means. One interpretation is this: suppose that we match up all the couples. Some of them may be dissatisfied, and wish to break apart from the shackles of holy matrimony. If a woman desires another man more than she desires her own husband, and that man also desires that woman more than he desires his own wife, then they may each get a divorce from their current partners and marry each other instead. We say that the matching we initially had is “unstable”. If no such pairs exist, then the matching is stable.
The problem is then the following: suppose we ask each man to create a list, sorted by preference, of all the women, and we ask all the men to create a list, sorted by preference, of all the men. Can we create a stable matching, i.e, no two people will agree on replacing their partners?
It turns out that we always can, and rather efficiently, too. The basic algorithm for doing this is called the Gale-Shapley algorithm. It solves the problem in up to n2 iterations, as follows: in the first iteration, all the men approach the woman first on their list – the one they want the most. Some women may get a lot of suitors, while others may get none at all. This is ok. Each woman, from all the possible suitors, picks the man she likes best, and sends the rest away. So it goes. During every subsequent iteration, all the rejected men go to the next woman on their list, with the women picking between the new guys and the one they chose the previous iteration. The process continues until no one is rejected (this is bound to happen eventually; think why).
In the end, everyone gets matched up, and it can be proven that the matching contains no unstable couples. Of course, the scheme can also be reversed, with the women approaching the men (it turns out that this yields asymmetric results, with the active suitors getting matches that are better for them. Conclusion? It’s better to ask people out than to wait to be asked out).

Final remarks:
Feel free to assemble all your single friends and try these algorithms out. I’m sure it will create no social anxiety whatsoever, and in the end, everyone will be matched up! Hurray for mathematics.
Happy Valentine’s day.

* Note: in the problems introduced we always assumed the happy case, of an equal amount of men and women. Actually, all three problems can be extended to an unequal amount. For Hall’s theorem, instead of asking if everyone can be paired up, we can ask if all the men can find wives, or, conversely, if all the women can find husbands; the theorem stays the same. For the Hungarian algorithm, if there are, for example, more men than women, we can add fictitious “unmarriable-women” until there are an equal amount, and heavily penalize choosing them – no man would want to stay alone. The Gale-Shapely algorithm stays the same, knowing that not all participants will get a date at the end.
To sum it up and state the obvious: when there is a numerical bias towards one gender, someone always gets left out, resulting in mathematical misery. The Technion, having a whopping 35% female minority out of its 13,000 students, is a fine example of this (that’s actually a generous average term; in the physics and mathematics faculties, it’s about 20%…).

Top English is more readable than Bottom English

December 20, 2013

The coolest thing about the following text is that you can read it.


Indeed, it has been known for quite a while that people hardly read individual letters. Familiar words are instantly recognized, and idioms, common phrases, and logical extensions from context all make reading much faster than if we had read words letter by letter. Just try reading something in a foreign language with Latin script, or handling a book with plenty of new vocabulary.
How much of each line can we crop while still maintaining readability? Of course, if we reduce each line to just a few scattered dots, we won’t be able to infer anything about the words; it’s natural to assume that there exists a steep threshold at which the text becomes unreadable. By unreadable, I actually mean, hard to read fluently. Perhaps we can use a computer algorithm to create a signature for the top of each letter and try to infer words based on English statistics and Markov chains; perhaps if we sit for 5 minutes per word we can sort it out; but that’s beyond the point. The point is fluid human readability.
The obvious thing to do would be to write a Python script that takes some text, and crops its lines by a given percentage. The above picture shows what happens when you remove the bottom 50% of each line. The full line markers can be seen in green and yellow on the left hand side. For me, cropping each line to about 1/e of its normal height is already too much, and while some words can be inferred, in general the text cannot be fluently read. So perhaps, for this font, the threshold is somewhere between 50% and 37%.


One can also cut from top, and be left with only the bottom part of each sentence. To me, this is less readable, mainly because of all the i’s, h’s, n’s, and m’s mixing together at the bottom.


So Top English is more readable than Bottom English!

This is probably very font-dependant. In some fonts the letters are more distinct, and are therefore easier to recognize even when cropping them. In others they are very similar and just a slight cropping reduces the text to an intangible mess of lines. I suspect that Sans Serif fonts are harder to read than Serif fonts, but I have not tested this; it may also depend on the individual reading the text.
In any case, this yields an interesting metric for rating different fonts: their robustness to letter erosion. A font would be said to be more robust, if its cut-off threshold for readability is lower. Of course, we are not constrained to just removing the top bottom or top half: here is the same text, with only a random 40% of the pixels intact (60% of the pixels were erased):


At a deletion rate of > 70%, it’s already very hard to read the text, but we see that for 60% it’s relatively ok. So different pixel-erasing methods can yield different thresholds, and this can also be taken into account when trying to calculate the font robustness.

Two final points:
1) The program I wrote works equally well for Hebrew, although for the text I used, 50% already gave me a hard time:


2) It does take a bit of a mental effort to read the cropped text. If you look at the text without trying to read it (and, say, squint your eyes a little), it looks like a foreign script (at least to me)! In fact, if you want to quickly generate a an unintelligible script for a movie / animation, you could crop the lines in a similar fashion, and flip the letters horizontally / vertically. I suspect this will be enough so that at first glance, no one will recognize the origin.

Whatshisname’s Theorem

December 12, 2013

Maybe it’s because I recently read Contact and 2001: A Space Odyssey; perhaps it’s the glorious recent exoplanet reports from analysis of Kepler data; or it could just be random fluctuations; but lately I’ve been thinking about how lovely it would be to finally establish first contact with an alien civilization.
As fun as intergalactic thermonuclear war, xenocide, and total annihilation of either the alien (in the worst case) or of the human (in the best) race may seem, I wishfully hope that we will at once embark on a massive effort of translation to and from their major languages and ours. This will probably serve as a delight to my linguistic friends, but of course it would only be the precursor to the tremendous and virtually endless transfer and sharing of scientific knowledge.
Think of all the possibilities! Whether it is they or us with the more advanced technology, who knows what we may discover? Could it be that their astronomers, having lived in a different solar system, know more about the formation of stars? Could it be that they have found an intuitive and reasonable interpretation to the philosophical mess behind quantum mechanics? Could it be that they have journeyed through mathematical landscapes of which we cannot even dream? I think it’s safe to assume that a civilization capable of building spaceships has discovered at least some kind of calculus, but in general the history of mathematics is riddled with accidents and coincidental branches; any flip of the coin could have generated new fields, and who knows where we would be today had the bullet missed Galois tender stomach and let him live another day.
But we are remarkably unprepared for the day of sharing mathematical knowledge. Sure, we’ll have to make them understand what is right and what is wrong and what is right and what is left and a myriad of symbols and definitions, but one thing in particular stands out to me: the naming of our theorems.
Why, just today I had a problem in which I needed to calculate the electric flux through a sphere around an electron. It was a piece of cake, actually; I just had to take the Riemann integral of Coulomb’s field using Gauss’ theorem. Later, D’alembert’s criterion once again proved useful as Abel’s test has failed me, and I could find Euler’s function in no time.
Now, to deny the greatness of these people would be simply outrageous and totally unthinkable by all moral standards. However, I would like to think that mathematics (and physics) have about them a timeless, standalone, and civilization-independent quality. Honor granted, as a language mathematics should strive to be lucid and clear; part of this involves naming conventions. And famous-dead-guys’ names simply indicate nothing about the theorem or item which they describe.
Consider a computer program with millions of lines of code spread across thousands of files, performing in many different areas such as graphics, numerical calculations, and physical output. Does this seem like a valid function name?

“int joes_first_function(int x, int y);”

Of course, this appears in “algebra.h”, and it relates to the sizes of finite groups. After looking a bit at “analysis.h”, you again see

“int joes_first_function(double x, double y);”

This one has to do with derivatives. Since no one would ever use both algebra and analysis together, it’s perfectly reasonable, isn’t it? Anyone in the field will know what you are talking about anyway. After all, Joe was a famous computer programmer, and you learned about his functions as a college undergrad.
Indeed, as a computer program, mathematics would have a horrible naming problem. Sure, there would sometimes be something self-descriptive, perhaps “bool existance_and_uniqueness(ode)”, or “apply_triangle_inequality”, but for me, its mostly Euler, Abel, Gauss, Lagrange, and all their Enlightened friends (of course, a similar phenomenon exists in more modern mathematics as well).
Are our alien friends supposed to memorize our mathematicians’ names and cross correlate them to their fields of study? Should we write in our books that Jxxkarathrozmsq’s theorem, proved by Proxima Centauri 17(b)’s renown mathematician, is actually just a particular case of Euler’s identity?
I say, nay! To each purely personalized theorem, let us formulate a descriptive name, which at least gives us some clue as to its usage and field. To every unrecognized but named item, attach an indicative and translatable label. Eventually, after many more years of research, and after having met and shared with more and more foreign civilizations, our Complete Galactic Compendium of the Milky Way Mathematician will be complete, devoid of bland and forgotten cultural references, fully expressing the absolute nature of Mathematics.

A short note about amino acids and Hamming distance

October 26, 2013

It’s always interesting to see how work in one area of mathematics seems to help or imply results in a totally different field. Not long ago, I read a bit on the problem of random packing – if you park cars randomly on a street, how many cars can you fit in before you run out of space? During their work on this problem and its related variants, mathematicians Itoh and Solomon discovered a seemingly surprising connection to… molecular biology.


Human DNA / RNA is composed of 4 nucleotides, denoted C,G,A,T (for DNA) or C,G,A,U (for RNA). When the cell creates proteins, it reads these in triplets called codons; each codon codes for either a start/stop instruction, or for a single amino acid.
Since there are 4 types of nucleotides, there are 4*4*4 = 64 different combinations; potentially, each one could code for a different amino acid. However, the human body is capable of creating only 20 amino acids. The “start” command also generates an amino acid, so this, in conjunction with the “stop” command, means the 64 nucleotide combinations only describe 21 functions. There is therefore some redundancy, and several combinations describe the same acid / command. The following table summarizes which acid is encoded by which triplet:


There are several interesting phenomena in this table. I am no professional in the field of biochemistry, but I will give my humble thoughts on why they occur.

First, note that the amino acids tend to be encoded in clusters. For example, Proline is encoded by CCU, CCC, CCA, or CCG; in other words, any codon that begins with CC. Two questions may arise: 1) Why are the triplets so similar to each other – why aren’t they scattered throughout the entire table? 2) Why is there variation only in the last nucleotide? Why don’t we see acids being encoded by “any codon that ends with CC”?
The answer to the first is probably rooted in resistance to mutations. Suppose that due to a random mutation, one of the letters of the codon changes. It could be quite harmful if the amino acid encoded would also be different, since this can change the protein’s function. If the mutation is random – i.e there is an equal chance for every position to change into every other letter – then if we have an amino acid encoded by “everything that starts with CC”, it is resistant to 33% of the mutations, since any mutation changing the last letter would not change the encoding. However, if the same acid had codons that are all different by two or three nucleotides, chances are that any mutation would change it.
This can be more formally stated in terms of Hamming distance – which, given two strings, tells in how many places they are different from one another. For example, the distance between CCA and CCU is 1, while the distance between CCA and GAU is 3. Codons which are further away from each other will have a smaller chance to turn into one another due to mutation, and scattering the encoding across the whole table means a large Hamming distance from one codon to another.
The answer to the second question, why are there variations in the last nucleotide, I do not know, but can only speculate. One idea is that in the distant past, organisms were simpler, and used less building blocks. Specifically, codons were of length two, and not three, and in the transition from two to three, a nucleotide was added at the end. This idea has many “maybes” in it. Another idea, by my friend Eyal, is that the ribosome, which is responsible for putting the proteins together, has lower accuracy after already processing two nucleotides, and hence we want redundancy in the third.

A second question regarding the distribution of encodings in the table above is, why are there 21 different functions? Why not more, and why not less? It is here that a mathematical model may provide some insight, as found by Itoh and Solomon.
A codon may be written as a sequence of three letters out of a choice of the four letters CGAU. We can also write it in binary as a string of length six: C = 00, G = 01, A= 10, U = 11. Let’s consider all 64 different strings in binary, which correspond to all the possible codons. Choose one at random, and write it down. Now, keep choosing strings at random from the ones you haven’t chosen yet. However, write them down only if their Hamming distance from each of the strings you have already written down is at least 2. At the end, all the strings that you have written down will be different from each other by more than 1 bit. Sometimes the process ends and you have written only 14 strings, and other times you have written a full 32 (one for each opposite corner of the six dimensional hypercube generated by the strings).
The natural question to ask is, if you repeat this process, what is average number of strings that you eventually write down? I did not solve this analytically, but I did write a Lisp program to calculate it. The average of 10000 runs turns out to be about 20.1. This is strikingly close to the 21 different functions encoded by codons…
Of course, this may be entirely coincidental. Whatever way the modern genetic apparatus has developed, it was certainly more complex than picking out random strings and checking if they are close to each other or not. Further, a Hamming distance of 2 may still describe codons which are different by only one nucleotide, since it takes two bits for each letter. However, one cannot but think that there is indeed a connection, for as we have established above, resistance to mutations is caused by having a large Hamming distance between different amino acids. In any case, this is just a quotation of a result, not a proposed model.
Still, it is interesting to see that our own genetics are basically governed by the same considerations and principles we use when designing an error-resistant communication system and encoding. The fact that basic research on random parking of cars (in more than one dimension, eventually) led to insights in information, is also a comforting reminder, that you never know where you’ll go when tackling a seemingly innocent problem.

Some things never change…

August 23, 2013



Get every new post delivered to your Inbox.

Join 36 other followers