The NCAA basketball tournament and accompanying March Madness kicks off today. (Photo: Jeff Turner via Wikimedia Commons

The NCAA basketball tournament and accompanying March Madness kicks off today. (Photo: Jeff Turner via Wikimedia Commons

Big Data: The New Crystal Ball for Deciphering NCAA March Madness

Add Your Comments

As March Madness kicks off in earnest today, data is the new crystal ball, playing a growing role in office pools and pundit prognostications. Big data scientists are using analytics to predict bids, and sponsoring competitions to master tournament bracketology.

Several examples: University business professors using SAS analytic software accurately predict the at-large teams in the NCAA tournament, and predictive analytics competition site Kaggle has teamed with Intel to launch March Machine Learning Mania, in which participants build analytical models and predict the outcome of the tournament.

Analytical Madness

More than a decade ago, professors Jay Coleman of the University of North Florida in Jacksonville, Allen Lynch of Mercer University in Macon, Georgia,  and Mike DuMond of Charles River Associates and Florida State University in Tallahassee created the Dance Card  – a formula designed to predict which teams will receive at-large bids to the NCAA Tournament (aka the Big Dance). For the 2014 bids announced recently the dance card formula correctly predicted 35 of the 36 at-large bids. The model is a combined 108 of 110 over the last three years.

As a teaching tool for the professors’ students, the Dance Card analysis points to several significant factors that the Tournament Selection Committee weighs most heavily, including Rating Percentage Index, Sagarin rankings (USA Today), wins against top 25 teams, and other factors. In this video the professors discuss using SAS analytics to form the Dance Card formula and how the project came together.

Harnessing Machine Learning

The online platform for predictive modeling and analytics competitions Kaggle has a competition for applying analytics to the NCAA competition, called March Machine Learning Mania. Providing contestants with nearly two decades of historical game data, the challenge is to turn information into insight, building and testing their models and then later predicting the outcome of the 2014 tournament. Starting back in January, the Intel (INTC) sponsored challenge gives the team with the most accurate predictions a $15,000 cash prize.

Media Madness

For more on the role of data analysis in predicting the NCAA tournament, see features by The Denver Post and FiveThirtyEight.

About the Author

John Rath is a veteran IT professional and regular contributor at Data Center Knowledge. He has served many roles in the data center, including support, system administration, web development and facility management.

Add Your Comments

  • (will not be published)