Illustrations by Tianhua Mao.
If one is looking for signals from an extraterrestrial civilization, why not practice on some of the non-human communication systems already known on our own planet? Whales have had a global communication system for millions of years—longer than Homo sapiens has even existed. Bees, which communicate in part by dancing, had democratic debates about the best places to swarm millions of years before humans came up with democracy as a political system. And other examples abound. No person I know of who has studied another animal’s communication system has ever concluded that the species was dumber than they’d previously thought.
Through the study of animal communication, my colleagues and I have developed a new kind of detector, a “communication intelligence” filter, to determine whether a signal from space is from a technologically advanced civilization or not. Most previous SETI (Search for Extraterrestrial Intelligence) efforts have looked for radio transmissions with a narrow band of frequencies or for optical signals that blink very rapidly. From what we know about astrophysics, such transmissions would be clearly artificial, and their discovery would indicate technology capable of transmitting a signal over interstellar distances. SETI efforts generally throw away wideband radio signals and slower optical pulses, whose provenance is less obvious. Although those signals might well be from intelligent beings, they might also originate in natural sources of radio waves, such as interstellar gas clouds, and we have lacked a good way to tell the difference.
Put simply, we may well have received a message from intelligent beings and neglected it because it didn’t conform to our expectations for what a signal should look like. And this might be why we have yet to detect any interstellar communications in 50 years of searching.
Over the past decade and a half, my colleagues and I have sought a better way. We have been applying information theory to human and animal communication systems, and we can now tell that particular species could be communicating complex ideas even though we don’t know what they are saying. (We use the term “communication system” in order not to prejudge whether other species have language in the human sense.) Complex communication follows general syntax-like rules that reveal what might be called intelligence content. If we have a large enough sample of the message, we can quantify its degree of complexity or rule structure. In the mathematics of information theory, this structure is called the “conditional information entropy” and is made up of mathematical relations among the elementary units of communication, such as letters and phonemes. In everyday speech, we recognize this structure as grammar and, at an even more basic level, as the packaging of sounds into words and sentences. For the first time, we at the SETI Institute in Mountain View, California, have begun to look for this structure in SETI data.
My colleagues Brenda McCowan and Sean F. Hanser at the University of California at Davis and I decided to study species that were both socially complex and highly dependent on acoustic communication, using sound signals that we could readily classify. Thus, our first three subject species were bottlenose dolphins (Tursiops truncatus), squirrel monkeys (Saimiri sciureus), and humpback whales (Megaptera novaeangliae).
One aspect of human linguistics that emerged from early statistical studies of letters, words, and phonemes is known as Zipf’s Law, after the Harvard University linguist George Zipf. In English text, there are more e’s than t’s, more t’s than a’s, and so on, down to the least frequent letter, “q.” If one lists the letters from “e” to “q” in descending order of frequency and plots their frequencies on a log-log graph, one can fit the values with a 45-degree line—that is, with a line with a slope of –1. If one does the same thing with text made up of Chinese characters, one also gets a –1 slope. And the same is true with the letters, words, or phonemes of a conversation in Japanese, German, Hindi, and dozens of other languages. Baby babbling does not obey Zipf’s Law. Its slope is less than –1 because the sounds spill out nearly at random. But as children learn their language, the slope gradually tilts up and reaches –1 by about the age of 24 months.
They did not have to receive the whole message to be able to fill in the blanks.
According to mathematical linguists, this –1 slope indicates that a given series of sounds or written symbols contains enough complexity to constitute a language. It is a necessary but not sufficient condition, meaning that this is the first test for complexity, but not a proof of it. According to Zipf himself, the reason for this –1 slope is a tradeoff he called the “principle of least effort.” It strikes a balance between the transmitter, who would like to expend the least amount of energy sending the signal, and the receiver, who would like the most redundancy to ensure that the whole message was received.
The key to the application of information theory is isolating the signaling units. For example, just plotting all the dots and dashes in Morse code will give a Zipf slope of about –0.2. But if one takes multiple dots and dashes as the elementary units—dot dot, dot dash, dash dot, and dash dash, as well as longer sequences—the slope tilts toward –1, reflecting how letters of the alphabet are encoded in this system. In this way, one can reverse-engineer what the original units of meaning are.
Most linguists used to suppose that Zipf’s Law was a characteristic of human languages only. So we were quite excited to find that, upon plotting the frequency of occurrence of adult bottlenose-dolphin whistles, that they, too, obeyed Zipf’s Law! Later, when two baby bottlenose dolphins were born at Marine World in California, we recorded their infant whistles and discovered that they had the same Zipf’s Law slope as baby human babbling. Thus baby dolphins babble their whistles and have to learn their communication system in a way not dissimilar from the way baby humans learn their languages. By the time the dolphins reached the age of 12 months, the frequency of occurrence distribution of their whistles had reached a –1 slope, as well.
Although we have yet to decipher what bottlenose dolphins are saying, we went on to establish that they and whales have a communication system with an internal complexity approaching that of human languages. This complexity makes communication resilient. Any creature that exchanges information has to be able to do so despite ambient noise, intervening obstacles, and other effects that interfere with signal propagation. Human language is structured to provide redundancy. At the most basic level, this structure determines the probability that a given letter will appear. If I tell you I’m thinking of a word, you might guess that the first letter is “t,” since this is the most common first letter of words in English. Your guess would be safe, but not very informative. We might say you played it safe with your guess. If instead you guess the letter “q” and you are correct, you gain some real information about what word I am thinking of if the word does, indeed, begin with the letter “q.”
Now take this a step further. If I told you that the letter I am thinking of is the second letter in a word whose first letter is “q,” you would immediately guess the letter “u.” Why? Because you know that these two letters go together with almost 100-percent probability in English. To guess what is missing, you are using not just the probability of occurrence of a letter, but the conditional probability between these two letters—namely, the probability of “u” given that the letter “q” has already occurred. Our brains use conditional probabilities whenever they need to fix errors in transmissions, such as faded text on a low-toner copy of a paper or garbled words in a noisy phone call.
For English words, conditional probabilities can be specified up to about nine words in a row. If you are missing one word, you can probably guess it by the context; if you are missing two words in a row, you can often still recover them from the context. As a short example, take a sentence missing a word: “How are (blank) doing today?” We can easily fill in the missing word “you” from the rules we know about the English language. Now consider a sentence missing two words: “How (blank) (blank) doing today?” It could be: “How is Joe doing today?” But there are other possibilities. Clearly, the more words that are missing, the harder it is to fill them in from context, and the lower the conditional probability between them. For most human written language, the conditional dependencies disappear when one is missing about nine words in a row. With 10 words missing, one really has no clue what these missing words could be. In the language of information theory, human word entropy goes up to about ninth-order.
We have discovered these conditional probabilities within animal communication systems as well. As an example, we were recording the sounds of humpback whales in southeastern Alaska with Fred Sharpe of the Alaska Whale Foundation. Humpbacks are famous for their songs, which are typically recorded when they go to Hawaii to mate. The calls they make in Alaska are very different: feeding calls intended to herd fish into nets made from bubbles and social calls rather than songs. We recorded these vocalizations in the presence and in the absence of boat noise. We calculated the degree to which the ocean channel acted like static in a phone line. Then we used information theory to quantify how much the whales would have to slow down their vocalizations in order to ensure error-free reception of the messages.
Even a very advanced extraterrestrial civilization would still have to obey the rules of information theory.
As expected, with boat noise, the whales slowed their rate of vocalization, just as one would do when talking on the phone with noise in the background. But they were slowing down in their transmissions by only about three-fifths of what they theoretically needed to do in order to ensure that the entire message was received without misunderstandings. How did they get away with not slowing down their vocalizations as much as the noise level seemed to require? We pondered this for some time before realizing that their communication system must have enough rule structure to recover the final two-fifths of the signal. The humpbacks were exploiting conditional probabilities between their sonic equivalent of words. They did not have to receive the whole message to be able to fill in the blanks.
We found internal structure in dolphin communication as well. The big difference is that the dolphins have a core of about 50 signal types, whereas the humpbacks have hundreds. We are currently collecting data to determine what the highest-order entropy of the humpback whale communication system may be.
As a test of our approach’s ability to separate astrophysics from an intelligent signal, we turned to an example from radio astronomy. When stellar pulsars were discovered by astronomers Jocelyn Bell Burnell and Antony Hewish in 1967, they were dubbed “LGMs” for “little green men.” Since these radio sources pulsed so regularly, some scientists initially speculated that they could be the beacons of very advanced extraterrestrials. So we re-analyzed the pulses from the Vela Pulsar with the help of Simon Johnston of the Australia Telescope National Facility and obtained a Zipf slope for the pulsar signals of about –0.3. This is inconsistent with any language as we know it. In addition, we found little or no conditional probabilistic structure within the pulsar signals. And indeed pulsars are now known to be natural remnants of stellar supernovae. Information theory could thus easily distinguish between a putative intelligent signal and a natural source.
We are currently analyzing microwave data obtained at the SETI Institute’s Allen Telescope Array, which consists of 42 individual telescopes observing in the frequency band from 1 to 10 gigahertz. In addition to the usual technique of looking for narrow radio carrier waves, we are beginning now to apply information-theoretic measures. This work is being performed in collaboration with Gerry Harp, Jon Richards, and Jill Tarter of the SETI Institute. If we find signals that obey Zipf’s Law, for example, that would encourage us to go ahead and look for syntax-like structure within the signals in order to quantify how complex the candidate message actually is.
To transmit knowledge, even a very advanced extraterrestrial civilization would still have to obey the rules of information theory. While perhaps not being able to decipher such a message because of lack of common symbols (the same problem we have with, for example, humpback whales), we would get an indication of how complex their communication system—and thereby their thought processes—may be. If the conditional probabilities of a SETI signal are, for example, 20th-order, then not only is the signal artificial in origin, but it would reflect a language far more complex than any on Earth. We would have a quantitative measure of the complexity of the thought processes of a transmitting ETI species.
Laurance R. Doyle is the director of the Institute for the Metaphysics of Physics at Principia College in Elsah, Illinois, and the organizer of the Quantum Astrophysics Group at the SETI Institute in Mountain View, California. He was a member of the NASA Kepler Mission Science Team and led the team that made the first direct detection of a circumbinary planet, Kepler-16b (nicknamed “Tatooine”).