The Lessons of Moneyball for Big Data Analysis

Brad Pitt and Jonah Hill in the film "Moneyball," which hits theaters today. Hill's character is based on Mets executive Paul DePodesta, who spoke at the Strata Summit this week.

Long before “Big Data” analysis was cool, Paul DePodesta brought it to the big leagues. And today, his story will be told on the big screen.

A phase of DePodesta’s career is depicted in the movie “Moneyball,” which premieres today on more than 3,800 screens around the country. The film is based on the best-selling 2003 book in Michael Lewis chronicled the data-driven resurgence of the Oakland A’s engineered by A’s general manager Billy Beane and DePodesta, who used computer analysis to identify undervalued players. The character based on DePodesta has been renamed Peter Brand and is played by Jonah Hill.

In a presentation Tuesday at the Strata Summit in New York, DePodesta, who is now Vice President for Player Development for the New York Mets,  reflected on the role of performance analysis in baseball and lessons that can be applied to data-driven organizations. When he arrived in Oakland, DePodesta recalled, small-market teams like the A’s with limited budgets found themselves outgunned in bidding wars with wealthier teams in markets like New York and Boston.

“We had to come up with a different way,” said DePodesta. “It was like preparing a gourmet meal, but having to shop at 7-11.”

Data vs. Scouting Subjectivity
The solution embraced by Beane and DePodesta was influenced by  a school of baseball statistical analysis known as sabermetrics (a reference to the Society for American Baseball Research), which was often at odds with traditional methods of scouting players.

“Subjectivity ruled the day in evaluating players,” he said. “We had a completely new set of metrics that bore no resemblance to anything you’d seen. We didn’t solve baseball. But we reduced the inefficiency of our decision making.”

Speaking to a crowd of executives and data scientists, DePodesta discussed the process of making those data -driven decisions, and how to avoid analytical errors that could lead to bad conclusions. In many instances, the challenge is in taking a clear-eyed view of the data – which often involves filtering out emotional responses to data and player performance.

“We constantly seek causal relationships, and we can be tricked by them,” said DePodesta. “Often times we get tied to things, and don’t necessarily know why.”

Common Biases in Data Analysis
It’s easy to develop “affirmation bias,” DePodesta said. “Once we’ve made up our minds, we resist information that doesn’t agree with our conclusion,” he said.

A particular problem in baseball is “appearance bias” – the notion that some athletes look more like great baseball players than others. It’s also an issue in business, DePodesta said, citing a data point from Malcolm Gadwell on height and business success. Gladwell found that although just 3.9 percent of American males are 6-foot-2 or taller, about 30 percent of Fortune 500 CEOs are 6-foot-2 or taller.

Making good decisions meant stripping away those biases.

“We turn to data as our flashlight in the cave – our guiding light,” DePodesta said. “We said ‘unless we can prove it, we’re not going to believe it.’ We had to be absolutely relentless in asking the naïve question. The only thing we were wed to was the idea of being open-minded.”

Get Daily Email News from DCK!
Subscribe now and get our special report, "The World's Most Unique Data Centers."

Enter your email to receive messages about offerings by Penton, its brands, affiliates and/or third-party partners, consistent with Penton's Privacy Policy.

About the Author

Rich Miller is the founder and editor at large of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

Add Your Comments

  • (will not be published)


  1. Leighton

    This is very interesting. Goes to show you the power of technology.

  2. John

    Just saw the movie, and came back to research why Paul DePodesta did not allow his name to be used. Ironically, the shy sabermetrician may get more web searches and name recognition boost from his coy approach to this movie than any of the other personalities on whom it is based. I for one am not searching for Billy Beane, Brad Pitt or anyone else right now. Hmm.

  3. I have got to go see this movie today! I taught a business intelligence class last week and one of the students worked for a professional basketball team and brought their data. We used the Excel data mining add-in to explore how we could use it for draft pick analysis. An excellent webinar for doing the same for fantasy baseball can be found here: If you are interested in this sort of technology, be sure to read SQL Server 2008 Data Mining. Chapter two especially because it shows you how to do everything using the Excel data mining add-in.

  4. Love the notion of "affirmation bias" - resisting information that doesn't agree with our conclusions. We see that all the time using software to analyze business performance or more specifically the performance of a business process. Developing a query in and of itself can become a form of affirmation bias - in the sense that one is posing a question of the data, as opposed to a data-driven form of analysis, which is basically interacting with data with an open mind -- whenever anyone says "that never happens" we usually find a very interesting and relevant set of exceptions that shine a light on an unknown issue that would never be discovered ina query- driven approach. Beginning of interesting discussion on Lavastorm Analytics Community Group

  5. Another nice one: