The Lessons of Moneyball for Big Data Analysis
Long before “Big Data” analysis was cool, Paul DePodesta brought it to the big leagues. And today, his story will be told on the big screen.
A phase of DePodesta’s career is depicted in the movie “Moneyball,” which premieres today on more than 3,800 screens around the country. The film is based on the best-selling 2003 book in Michael Lewis chronicled the data-driven resurgence of the Oakland A’s engineered by A’s general manager Billy Beane and DePodesta, who used computer analysis to identify undervalued players. The character based on DePodesta has been renamed Peter Brand and is played by Jonah Hill.
In a presentation Tuesday at the Strata Summit in New York, DePodesta, who is now Vice President for Player Development for the New York Mets, reflected on the role of performance analysis in baseball and lessons that can be applied to data-driven organizations. When he arrived in Oakland, DePodesta recalled, small-market teams like the A’s with limited budgets found themselves outgunned in bidding wars with wealthier teams in markets like New York and Boston.
“We had to come up with a different way,” said DePodesta. “It was like preparing a gourmet meal, but having to shop at 7-11.”
Data vs. Scouting Subjectivity
The solution embraced by Beane and DePodesta was influenced by a school of baseball statistical analysis known as sabermetrics (a reference to the Society for American Baseball Research), which was often at odds with traditional methods of scouting players.
“Subjectivity ruled the day in evaluating players,” he said. “We had a completely new set of metrics that bore no resemblance to anything you’d seen. We didn’t solve baseball. But we reduced the inefficiency of our decision making.”
Speaking to a crowd of executives and data scientists, DePodesta discussed the process of making those data -driven decisions, and how to avoid analytical errors that could lead to bad conclusions. In many instances, the challenge is in taking a clear-eyed view of the data – which often involves filtering out emotional responses to data and player performance.
“We constantly seek causal relationships, and we can be tricked by them,” said DePodesta. “Often times we get tied to things, and don’t necessarily know why.”
Common Biases in Data Analysis
It’s easy to develop “affirmation bias,” DePodesta said. “Once we’ve made up our minds, we resist information that doesn’t agree with our conclusion,” he said.
A particular problem in baseball is “appearance bias” – the notion that some athletes look more like great baseball players than others. It’s also an issue in business, DePodesta said, citing a data point from Malcolm Gadwell on height and business success. Gladwell found that although just 3.9 percent of American males are 6-foot-2 or taller, about 30 percent of Fortune 500 CEOs are 6-foot-2 or taller.
Making good decisions meant stripping away those biases.
“We turn to data as our flashlight in the cave – our guiding light,” DePodesta said. “We said ‘unless we can prove it, we’re not going to believe it.’ We had to be absolutely relentless in asking the naïve question. The only thing we were wed to was the idea of being open-minded.”
LeightonPosted September 23rd, 2011
This is very interesting. Goes to show you the power of technology.
JohnPosted September 23rd, 2011
Just saw the movie, and came back to research why Paul DePodesta did not allow his name to be used. Ironically, the shy sabermetrician may get more web searches and name recognition boost from his coy approach to this movie than any of the other personalities on whom it is based. I for one am not searching for Billy Beane, Brad Pitt or anyone else right now. Hmm.
I have got to go see this movie today! I taught a business intelligence class last week and one of the students worked for a professional basketball team and brought their data. We used the Excel data mining add-in to explore how we could use it for draft pick analysis. An excellent webinar for doing the same for fantasy baseball can be found here: http://pragmaticworks.com/Resources/webinars/WebinarSummary.aspx?ResourceID=289
If you are interested in this sort of technology, be sure to read SQL Server 2008 Data Mining. Chapter two especially because it shows you how to do everything using the Excel data mining add-in.
[...] A recent article takes a look at the story of Paul DePodesta, a central character in the new film Moneyball: “Long before ‘Big Data’ analysis was cool, Paul DePodesta brought it to the big leagues… A phase of DePodesta’s career is depicted in the movie ‘Moneyball,’ which premieres today on more than 3,800 screens around the country. The film is based on the best-selling 2003 book in Michael Lewis chronicled the data-driven resurgence of the Oakland A’s engineered by A’s general manager Billy Beane and DePodesta, who used computer analysis to identify undervalued players. The character based on DePodesta has been renamed Peter Brand and is played by Jonah Hill.” [...]
Love the notion of “affirmation bias” – resisting information that doesn’t agree with our conclusions. We see that all the time using software to analyze business performance or more specifically the performance of a business process.
Developing a query in and of itself can become a form of affirmation bias – in the sense that one is posing a question of the data, as opposed to a data-driven form of analysis, which is basically interacting with data with an open mind — whenever anyone says “that never happens” we usually find a very interesting and relevant set of exceptions that shine a light on an unknown issue that would never be discovered ina query- driven approach. Beginning of interesting discussion on Lavastorm Analytics Community Group
[...] statisticians: Lessons from Paul DePodesta (whose real-life role roughly corresponds to that of Jonah Hill’s [...]
[...] The Lessons of Moneyball for Big Data Analysis (datacenterknowledge.com) [...]
Analyzing the Data || Changing the Valley, Changing the World || The Valley in Action « Silicon Valley ActionPosted October 26th, 2011
[...] I was driving around touring properties with a client and he was telling me about his new venture, Big Data Analysis. He was saying how that section of the software industry along with social networking is really [...]