Pocket worthyStories to fuel your mind

The Selfish Dataome

Does the data we produce serve us, or vice versa?

Caleb Scharf

Read when you’ve got time to spare.

You’ve heard the argument before: Genes are the permanent aristocracy of evolution, looking after themselves as fleshy hosts come and go. That’s the thesis of a book that, in 2017, was christened the most influential science book of all time: Richard Dawkins’ The Selfish Gene.

But we humans actually generate far more actionable information than is encoded in all of our combined genetic material, and we carry much of it into the future. The data outside of our biological selves—call it the dataome—could actually represent the grander scaffolding for complex life. The dataome may provide a universally recognizable signature of the slippery characteristic we call intelligence, and it might even teach us a thing or two about ourselves.

It is also something that has a considerable energetic burden. That burden challenges us to ask if we are manufacturing and protecting our dataome for our benefit alone, or, like the selfish gene, because the data makes us do this because that’s what ensures its propagation into the future.

Take, for instance, William Shakespeare.

Who’s in Charge?: The bard has become a living part of the human dataome. Photo by John Taylor / Wikipedia.

***

Shakespeare died on April 23, 1616 and his body was buried two days later in Holy Trinity Church in Stratford-Upon-Avon. His now-famous epitaph carries a curse to anyone who dares “move my bones.” And as far as we know, in the past 400 years, no one has risked incurring Will’s undead wrath.

But he has most certainly lived on beyond the grave. At the time of his death Shakespeare had written a total of 37 plays, among other works. Those 37 plays contain a total of 835,997 words. In the centuries that have come after his corporeal life an estimated 2 to 4 billion physical copies of his plays and writings have been produced. All of those copies have been composed of hundreds of billions of sheets of paper acting as vessels for more than a quadrillion ink-rich letters.

We have been pumping out persistent data since our first oral exchange of a good story.

Across time these billions of volumes have been physically lifted and transported, dropped and picked up, held by hand, or hoisted onto bookshelves. Each individual motion has involved a small expenditure of energy, maybe a few Joules. But that has added up across the centuries. It’s possible that altogether the simple act of human arms raising and lowering copies of Shakespeare’s writings has expended well over 4 trillion Joules of energy. That’s equivalent to combusting several hundred thousand kilograms of coal.

Additional energy has been utilized every time a human has read some of those 835,997 words and had their neurons fire. Or spoken them to a rapt audience, or spent tens of millions of dollars to make a film of them, or turned on a TV to watch one of the plays performed, or driven to a Shakespeare festival. Or for that matter bought a tacky bust of “the immortal bard” and hauled it onto a mantelpiece. Add in the energy expenditure of the manufacture of paper, books, and their transport and the numbers only grow and grow.

It may be impossible to fully gauge the energetic burden that William Shakespeare unwittingly dumped on the human species, but it is substantial. Of course, we can easily forgive him. He wrote some good stuff. But there is also a sense in which the data of Shakespeare has become its own living part of the dataome, propagating itself into the future and compelling all of us to support it, just as is happening right now in this sentence.

***

Shakespeare, to be fair, contributed barely a drop to a vast ocean of data that is both ethereal yet actually extremely tangible in its effects upon us. This is both the glory and millstone of Homo sapiens.

We have been pumping out persistent data since our first oral exchange of a good story and our first experimental handprint on a cave wall. Neither of those things were explicitly encoded in our DNA, yet they could readily outlive the individual who created them. Indeed, data like these have outlived generation after generation of humans.

But as time has gone by our production of data has accelerated. Today, by some accounts, our species generates about 2.5 quintillion bytes of data a day. That’s more than a billion billion bytes for each planetary rotation. And that rate of output is still growing. While lots of that data is a mixture of fleeting records—from Google searches to air traffic control—more and more ends up persisting in the environment. Pet videos, GIFs, political diatribes, troll responses, as well as medical records, scientific data, business documents, emails, tweets, photo albums, all wind up as semi-permanent electrical blips in doped silicon or magnetic dots on hard drives.

In Perspective: The human genome fits on about two CDs. The human species produces about 20,000 CDs worth of data a second. Photo by Libor Píška / Shutterstock.

This data production and storage takes a lot of energy to maintain, from the moment someone’s hands scrabble for rare-earth elements in the soil, to the electricity that sustains it all. There’s a reason that a large company like Apple builds its own data server farms, and looks for ways to optimize the power generation that these air-conditioned, electron-pushing factories demand, whether it’s building massive solar farms in Nevada or utilizing hydroelectricity in Oregon.

Even Shakespeare’s medium—traditional paper—is still an energy-hungry beast. In 2006 it was estimated that United States paper production gulped down about 2,400 trillion BTUs (about 4 million trillion trillion trillion Joules) to churn out 99.5 million tons of pulp and paper products. That amounts to some 28,000 Joules of energy used per gram of final material—before any data is even printed on it. Or to put it another way, this is equivalent to roughly 5 grams of high-quality coal being burnt per page of paper.

Why are we doing this? Why are we expending ever increasing amounts of effort to maintain the data we, and our machines, generate? This behavior may represent far more than we at first think.

Our dataome is both an advantage to us humans, and a burden.

On the face of things, it seems pretty obvious that our capacity to carry so much data with us through time is a critical part of our success at spreading across the planet. We can continually build on our knowledge and experience in a way that no other species seemingly does. Our dataome provides us with a massive evolutionary advantage.

But it’s clearly not free. We may be trapped in a bigger Darwinian reality where we are in effect now serving as a supporting organelle for our own dataome.

This is an unsettling framework for looking at ourselves. But it has parallels in other parts of the natural world. Our microbiome, of tens of trillions of single-celled organisms, is perpetuated not so much by us as individuals, but by generations of us carrying this biological information through time. Yet we could also flip this around and conceptualize the situation as the microbiome carrying us through time. The microbiome exists in us because we’re a good environment. But that’s a symbiotic relationship. The microbes have to do things a certain way, have to work at supporting their human carrying systems. A human represents an energetic burden as much as an evolutionary advantage to microbes. Similarly, our dataome is both an advantage to us humans, and a burden.

The question is, is our symbiosis still healthy? The present-day energetic burden of the dataome seems like it could be at a maximum level in the history of our species. It doesn’t necessarily follow that we’re experiencing a correspondingly large benefit. We might do well to examine whether there is an optimal state for the dataome, a balance between the evolutionary advantages it confers on its species and the burden it represents.

The proliferation of data of seemingly very low utility (that I might grumpily describe as cat pictures and selfies) could actually be a sign of worrying dysfunction in our dataome. In other words, undifferentiated and exponential growth of low-value data suggests that data can get cancer. In which case we’d do well to take this quite seriously as a human health issue—especially if treatment reduces our global energy burden, and therefore our impact on the planetary environment.

Improving the utility of our data, purging it of energy-wasting junk might not be popular, but could perhaps be incentivized. Either through data credit schemes akin to domestic solar power feeding back to the grid, or making the loss of data a positive feature. What you might call a Snapchat approach.

In that case, the human-dataome symbiosis might become the only example in nature of a symbiotic relationship that is consciously managed by one party. What the long-term evolutionary robustness of that would be is hard to say.

But more optimistically; if the dataome is indeed an integral and integrated part of our evolutionary path then perhaps by mining it we can learn more about not just ourselves and our health, but the nature of life and intelligence in general. Precisely how we interrogate the dataome is a wide-open question. There may be emergent structure within it that we simply haven’t recognized, and we will need to develop measures and metrics to examine it properly. Existing tools like network theory or computational genomics might help.

The potential gains of such an analysis could be enormous. If the dataome is a real thing then it represents a missing piece of our puzzle; of the function and evolution of a sentient species. We’d do well to at least take a look. As Shakespeare once said : “The web of our life is of a mingled yarn, good and ill together.”

Caleb Scharf is an astrophysicist, the Director of Astrobiology at Columbia University in New York, and a founder of yhousenyc.org, an institute that studies human and machine consciousness. His latest book is The Zoomable Universe: An Epic Tour Through Cosmic Scale, from Almost Everything to Nearly Nothing.