Big Data Doesn’t Interpret Itself

The history of astronomy shows that observations can only explain so much without the interpretive frame of theories and model

Big data and machine learning are powering new approaches to many scientific questions. But the history of astronomy offers an interesting perspective on how data informs science—and perhaps a cautionary tale.

Early Babylonian astronomers took what today we’d call a pure “big data” or “pattern recognition” approach. They accumulated observations of solar, lunar and planetary motion and eclipses for many centuries and identified various cycles that had repeated many times. Simply by assuming that those cycles would continue, they were able to give good advice for planting, irrigation and harvest times, to cast credible horoscopes and to predict in advance when lunar eclipses would occur.

The ancient Greek astronomers used two distinct methods to understand the same data set. The first was to make geometric models that treated the sun, moon, planets and stars as mathematical abstractions—shiny points carried upon uniformly rotating celestial spheres.

At first, the Greeks’ predictions were no better than those of the Babylonians—in fact, they were significantly worse. But they patched things up by postulating additional movements of the spheres, called epicycles. These models, which were perfected by the 2nd-century astronomer Ptolemy, seem ugly in retrospect, but they did package the astronomical data in a relatively compact form, and they gave useful practical results.

The second method used by Greek astronomers was to consider astronomical bodies as real objects with physical properties. Perhaps the high point of this effort was the brilliant determination by Aristarchus, in the 3rd century B.C., of the ratio of the distances from the Earth to the sun and the moon. Assuming that the moon shines by reflected sunlight, and measuring the angle between the sun and the half-moon when both are visible in the sky, he calculated the ratio using simple trigonometry.

Yet a proper synthesis of the mathematical and physical approaches to astronomy wasn’t achieved for many centuries. That’s because the available “big data”—the easily observable patterns of the sun, moon and stars—are cryptic, superficial signs of the deep structure beneath.

Copernicus, in the 16th century, discovered that he could get more beautiful versions of Ptolemy-style models if he put the sun, rather than the Earth, at the center of the celestial spheres. Ptolemy’s work typically gets rough treatment in the history of science, but it was absolutely essential to Copernicus’s breakthrough in offering a physical explanation of “coincidences” among the model’s parameters.

Not long after, Galileo ’s homemade telescope revealed the phases of Venus, Jupiter’s attendant satellites—a “solar system” in miniature—and the topography of the moon. The night sky came to life as a showcase of tangible, physical bodies rather than an exercise in idealized points and imaginary spheres. When Isaac Newton distilled the universal laws of motion and gravity, he reunited the “big data” approach of the Babylonians and Ptolemy with the physics of Aristarchus and Galileo, launching truly modern science.

The big lesson is that big data doesn’t interpret itself. Making mathematical models, trying to keep them simple, connecting to the fullness of reality and aspiring to perfection—these are proven ways to refine the raw ore of data into precious jewels of meaning. 

Originally appeared on September 5, 2019 on The Wall Street Journal website as ‘Big Data Doesn’t Interpret Itself

Frank Wilczek is the Herman Feshbach Professor of Physics at MIT, winner of the 2004 Nobel Prize in Physics, and author of the books Fundamentals: Ten Keys to Reality (2021), A Beautiful Question: Finding Nature’s Deep Design (2015), and The Lightness of Being: Mass, Ether, and the Unification of Forces (2009).