Pocket worthyStories to fuel your mind

How the Space Program Created a Culture of Learning From Failure

In space, every failure could equal disaster. On Earth, they were priceless gifts.

Fast Company

Read when you’ve got time to spare.

close up on the rocket engine and exhaust pipes of a rocket

Photo by pidjoe/Getty Images

The Apollo lunar module was a special spaceship in many ways. It was the first—and still the only—crewed spaceship designed just to be used in space. That’s why it looks so spindly and gawky. It never had to fly through the atmosphere, so it could be designed with almost pure utility, without the sleekness and protection required to account for atmospheric friction or drag.

But that special quality of the lunar module created a really huge problem: It couldn’t be flight-tested before use. The first time each lunar module got a shakedown flight was the moment it was being used in space. It also meant that the astronauts flying the lunar module could only ever learn to fly it using simulators (also designed by people who had never flown lunar modules).

So how do you make sure something that can’t be tested is safe? How did the people responsible for this space vehicle make sure it would do what NASA needed it to do when the astronauts climbed in and powered up?

At Grumman, the company that designed and built the lunar modules at a factory in Bethpage, Long Island, one part of the answer was: Every single failure of every single component had to be investigated, understood, and resolved.

The philosophy, in engineering parlance: There are no random anomalies.

It was an ethic that ended up permeating the entire lunar module effort: Anything that went wrong with any part or system could spell disaster in space. Therefore, anything that failed on Earth had to be resolved and solved. If there were no random failures, then every failure is, in fact, a test, and even more, a gift: With a vehicle that would only fly once, the only place to find and fix problems was before that first flight.

Two failures, in particular, illustrate the intensity of the effort and some of the challenges of engineering.

One was dramatic and high profile.

In December 1967, a lunar module that had been finished was being given a pressure test; the cabin was sealed and pressurized, as it would be in spaceflight. At Grumman, this vehicle was designated LM-5. Later, everyone would come to know it as Eagle, the lunar module that would be the first to land on the Moon during Apollo 11. As the pressure was brought up, one of the two distinctive triangular windows in front shattered.

It was a stunning moment. That window was just a single part in a spaceship with 1 million parts, but if a lunar module window shattered in flight, the two astronauts would die instantly.

“The window shattered without anybody touching it,” said Joe Gavin, the senior Grumman executive in Bethpage in charge of the lunar module. The windows were specially made by Corning, regarded as the best glass company in the world (and the company that went on to make the windows for every NASA spacecraft). The glass was single-pane and three-eighths of an inch thick.

Was there a flaw in the window when the glass was cast and finished? Had it been installed incorrectly? Had it been accidentally damaged after being installed?

“We went back to Corning and reviewed the whole process,” said Gavin. The staff at Grumman ended up “finding out a lot about glass that I don’t think any of us realized.”

In fact, as a result of the shattered window, NASA developed a new pre-acceptance testing procedure, to make sure the windows Grumman was getting from Corning really were robust and correctly made. A protective cover was developed for the windows once they had been installed in the lunar module that showed a mark if anyone so much as touched them.

“It was one of the last things done before launch at the Cape,” Gavin said, “to tear off those protective covers.”

The newly rigorous window screening tests proved invaluable: Eight lunar module windows failed during acceptance testing over the life of the Apollo program. In flight, though, Gavin noted, “we never had a . . . problem.”

But despite that intensive investigation, no one ever figured out why that window had shattered in the first place, a resolution, Gavin said, that “never . . . gave you a completely warm feeling.”

A second failure happened deep inside the system of Apollo manufacturing and testing, and it shows why such care was taken in both assembly procedures—lunar modules were built in a vast clean room, 200 feet long, 80 feet wide, and 35 high—and in tracking every part of every process.

The spacecraft that flew all the way to the Moon—the command module, the service module, and the lunar module—contained 71 pressurized tanks between them, containing fuel as well as gases of all kinds.

Also in 1967, a tank for the lunar module, designed to hold helium at an extremely cold temperature, failed while it was being tested at the company that made it, AiResearch in California.

That was the point of testing, of course, to catch the problems. The failure happened at a weld between two hemispheres of the tank.

And so a search for the problem began. The welding rod used to seal the tank’s parts was found and examined; it was the right material. But a microscopic examination of the place where the failure originated in the tank showed tiny cracks. Thomas Kelly was the senior engineer at Grumman in charge of the lunar module’s design and assembly, and in his book Moon Lander, Kelly tells the story of what happened.

A man named Henry Graf, the manager of the supercritical helium systems at AiResearch, “became obsessed with finding the cause of this failure.”

He led his engineering and quality staff through a minute examination of every step in the manufacturing process, starting with the receipt of the titanium forgings and the quality pedigree that accompanied them. At each step of the process, they looked at what had been done on the failed tank, and asked whether anything in this step was different from their process on the previous tanks.

Graf’s careful detective work paid off, discovering a cause so trivial that a less observant investigator would surely have overlooked it. Graf noticed one minor difference in the process for this tank and those that had preceded it: Instead of using new cloth pads to wipe the tank surfaces prior to welding, washed, re-used cloths were employed.

Examination of the washed cloths showed traces of detergent, and test samples [of titanium] that were wiped with them failed under combined stress and humidity testing. The trace detergent attacked titanium! There could be no more gripping example of the extreme sensitivity of highly stressed tank material and welds to contamination.

A change in procedure that might well have been thoughtful and well-motivated—and that, as Kelly pointed out, could easily have been overlooked—put at risk a future Moon landing in a way no one could have predicted, even if they had known about the change in advance. Indeed, had Graf not obsessively tracked the cause down, the laundered wiping cloths, with their trace detergent residue, might have damaged dozens of future tanks.

In the end, the staff making the lunar modules, and the parts for them, tracked 14,000 unexpected failures. By the time six lunar modules had landed on the Moon successfully, only 22 of those, including the shattered triangular window, were unresolved.


Charles Fishman, who has written for Fast Company since its inception, has spent four years researching and writing One Giant Leap, his New York Times best-selling book about how it took 400,000 people, 20,000 companies, and one federal government to get 27 people to the Moon. (You can order it here.)

How was it? Save stories you love and never lose them.


Logo for Fast Company

This post originally appeared on Fast Company and was published June 28, 2019. This article is republished here with permission.

Did you enjoy this story?

Get Fast Company’s newsletter