Of Mark and of Biology

As some of you might know, I wanted to take biology during high school, but instead chose physics, for various reasons. However, this does not mean that my interest in biology lessened, and I still find the topic quite fascinating, especially when related to microbiology or physics. During the weekend, I have written two short articles, presented below. They are somewhat technical, but also interesting, so please do try to read them. The first is on the Isoacceptor and its roles, and the second is on Thermophilus Proteins and how they effect various processes within the cell.

1) Review of the Isoacceptor TRNA
The isoacceptor trnaleu anticodon gag reads the nucleophilic group of how the cognate trnas can interact with proteins such as the ribosome crystals. From this, an antisecretory effectùit reduces the bacteria archaea, and a stable n aminoacylated minihelix – functioning as reflected by the sequential model of base pair of the peptidyl – transfers trobro and sugars in the catalytic mechanism of the geometry of the s subunit. From the next report from these differences, the p sitebound trna is in termination, and s crystal structure of the codon reading and trnas enter the first identified low resolution. In analogy with rf, and how they induce ester bond hydrolysis of numerous bundles of the e in the ribosome, making this reaction site and a post antibiotic drugs and s subunit works. They inferred a survey of inhibition allosteric inhibitors, although comparison of less stringently monitored explaining the s and qvist established with the structures is a site. While leaving humans, ribosomes differ: most frequent occurrence of near cognate asl alone. This may reflect a protein synthesis such as the situation changed it recruits – all other bacterial re growth – this quest against resistance mutants is not shown at all.

2) Thermophilus Proteins and the Cell
The proteins carry out of the regulatory molecules as essential, but there is the cell that can receive one of the aminoglycosides. Subsequently, the s ribosomal proteins i.e. site – while those in figure seemed to their susceptibilities tested, aminoglycosides are shown in the streptomyces along with non cognate asl alone and intramuscularly. Some promising successes in recent experimental studies show their large proteins edit ribosome and might be administered and effective drugs, and it was also induced to fit together ribosomes being either one part in the surface, and collaborators ogle, and site bound in the closed conformation. Increased rate of the united statesthermophilus proteins can be found in the activation entropy. This could be explained as well as a site aminoacyl trna. In summary, the closed conformation is awarded to stabilize the archaeon h marismortui since the minor groove of one amino acid and crescentin filaments are for the high resolution, and may display symmetry model put forth by the microsome fraction. This is termed absolute subtype of gtp hydrolysis in a number of Æoh of s subunit for subsequent structure without disrupting or deletions in the ribosome with s bound, when bpg binds to three dimensional chromosomes.

You may notice that these articles aren’t very easy to read or understand, but you may blame it on their very technical nature. However, the real reason for this, is that they were not written by me at all, but rather by a very simple Markov chain. Markov chains are models, in which going from one state to another depends only on the current state, and not on the development of the entire model. This means that we construct a scientific article, one word at a time, each time based only on what we have so far. The choice of which word to choose next is done randomly; however, we do not just pick a random word from the English dictionary, but rather do so intelligently, based on a probability distribution. Prior to writing the articles, we give the program many other articles to “read” and analyse (in this case: several pages from Wikipedia about various proteins were chosen). During this process, it notes which words tend to come after each other. For example, in academic biology, the phrase “protein synthesis” is much more abundant than the phrase “protein cats”. When writing the article, if our current word is “protein”, and we want to choose the next one, we are much more likely to choose the word to be “synthesis” rather than “cat”, in order to produce a paper that mimics what is written by university professors. In essence, we start with a single word or phrase, and slowly, word by word, build up the article.

The model I have written is very simple, and does not claim to be comprehensive, robust, efficient or complete. It is a “depth 1” Markov model, meaning that the next word is chosen based only on the previous word. A “depth 2” model, on the other hand, might look at the previous two words, and so on. The higher the depth, the more accurate are the sentences that will be produced, but also the more memory and CPU the program will consume. It is not difficult to produce a simple depth 2 model, but a depth 1 was sufficient for me in this case. Also, it does not deal with punctuation or grammar, although one can construct a model that does that. Overall, Markov models are used thoroughly in the industry and the academy, since they can generate quite good results. More complicated models called “Hidden Markov Models” try deal with several variables at once (unlike this one, which only deals with choosing a new word). For example, one can build a program that writes music, with note pitches, harmony, and tempo and rhythm all being variables which must be considered.
All in all, I am quite happy with the results, especially considering the simplicity of the model. In the above articles, I made less than 20 edits each, most of which consisted of adding proper periods and commas. A serious model can do this automatically, and generate results which barely need to be edited in order to look reasonable. Notice that if you compare the articles to real academic research papers, you will probably be as confused reading one as reading the other. So a final word: do not let technical jargon and long complicated sentences blind you to the fact, that some people write useless and utter unreadable crap.

The python script:

class Markov:

    def __init__(self):
        self.wordsDict = {}

    def addFileStatistics(self, filename):
        f = open(filename)
        text = f.read()

    def _addNextWordStatistics(self, text):
        noCharactersText = text
        # Removing characters we don't want to deal with
        for character in ["e.g.", ".", ",", "'", "!" ,"?" ,":",
                ,";", "'", '"', "\\", "/", "`", "%", "(", ")",
                "-", "_", "*", "[", "]", "\n", "\r"]:
            noCharactersText = noCharactersText.replace(character," ")
        words = noCharactersText.split()
        words = [word.lower() for word in words]
        for i in xrange( len(words)-1 ):
            word = words[i]
            nextWord = words[i+1]
            if word not in self.wordsDict:
                self.wordsDict[word] = []                


    def generateText(self, startingWord, numOfWords):
        newText = [startingWord]
        lastWord = newText[0]                        

        for i in xrange(numOfWords):
            if lastWord in self.wordsDict:
                concurrents = self.wordsDict[lastWord]
                concurrents = self.wordsDict \

            nextWord = random.choice(concurrents)
            lastWord = nextWord

        return "".join([word + " " for word in newText])

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s