Top English is more readable than Bottom English

The coolest thing about the following text is that you can read it.

moby_top_0.5

Indeed, it has been known for quite a while that people hardly read individual letters. Familiar words are instantly recognized, and idioms, common phrases, and logical extensions from context all make reading much faster than if we had read words letter by letter. Just try reading something in a foreign language with Latin script, or handling a book with plenty of new vocabulary.
How much of each line can we crop while still maintaining readability? Of course, if we reduce each line to just a few scattered dots, we won’t be able to infer anything about the words; it’s natural to assume that there exists a steep threshold at which the text becomes unreadable. By unreadable, I actually mean, hard to read fluently. Perhaps we can use a computer algorithm to create a signature for the top of each letter and try to infer words based on English statistics and Markov chains; perhaps if we sit for 5 minutes per word we can sort it out; but that’s beyond the point. The point is fluid human readability.
The obvious thing to do would be to write a Python script that takes some text, and crops its lines by a given percentage. The above picture shows what happens when you remove the bottom 50% of each line. The full line markers can be seen in green and yellow on the left hand side. For me, cropping each line to about 1/e of its normal height is already too much, and while some words can be inferred, in general the text cannot be fluently read. So perhaps, for this font, the threshold is somewhere between 50% and 37%.

moby_top_0.37

One can also cut from top, and be left with only the bottom part of each sentence. To me, this is less readable, mainly because of all the i’s, h’s, n’s, and m’s mixing together at the bottom.

moby_bottom_0.5

So Top English is more readable than Bottom English!

This is probably very font-dependant. In some fonts the letters are more distinct, and are therefore easier to recognize even when cropping them. In others they are very similar and just a slight cropping reduces the text to an intangible mess of lines. I suspect that Sans Serif fonts are harder to read than Serif fonts, but I have not tested this; it may also depend on the individual reading the text.
In any case, this yields an interesting metric for rating different fonts: their robustness to letter erosion. A font would be said to be more robust, if its cut-off threshold for readability is lower. Of course, we are not constrained to just removing the top bottom or top half: here is the same text, with only a random 40% of the pixels intact (60% of the pixels were erased):

moby_random_0.4

At a deletion rate of > 70%, it’s already very hard to read the text, but we see that for 60% it’s relatively ok. So different pixel-erasing methods can yield different thresholds, and this can also be taken into account when trying to calculate the font robustness.

Two final points:
1) The program I wrote works equally well for Hebrew, although for the text I used, 50% already gave me a hard time:

hamutag_top_0.5

2) It does take a bit of a mental effort to read the cropped text. If you look at the text without trying to read it (and, say, squint your eyes a little), it looks like a foreign script (at least to me)! In fact, if you want to quickly generate a an unintelligible script for a movie / animation, you could crop the lines in a similar fashion, and flip the letters horizontally / vertically. I suspect this will be enough so that at first glance, no one will recognize the origin.

About these ads

Tags: , , , , , ,

3 Responses to “Top English is more readable than Bottom English”

  1. abraham Says:

    Nice crope… what happens when you crop left or right?

  2. Lucasmarquardt Says:

    How do you crop the text?

    • physicalpandemonium Says:

      I wrote a script in Python that does it for me. I save the text as a picture, then scan lines pixel by pixel. I cluster them together based on if two consecutive lines contains part of a letter (this assumes that each line contains at least one continuous connected letter). This is not failproof, but it gives a pretty good heuristic that works ok on most texts I checked. Afterwards I can just crop the lines by deleting pixel rows.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 50 other followers

%d bloggers like this: