New Programs Emerge to Train Big Data Scientists
August 22nd, 2013 By: John Rath
The term “big data” has entered the spotlight. But as many have pointed out, big data alone is practically useless. It must be transformed into information and knowledge, with the help of analytics, innovation and data scientists.
But where will those data scientists come from? Universities and corporations with a stake in big data are investing in programs and facilities to train this new generation of technologists.
As a (somewhat) new profession, the data scientist is set to capture a variety of attributes important to the big data field. Data analysis certainly is not new, but the modern data scientist must leverage today’s tools, work efficiently in large data sets being generated and managed, and have the right mix of analytical skills and business acumen. It’s a mix of skills in database query languages, statistics, predictive and advanced analytics, programming, business intelligence and cognitive science, mixed in with a good base in business and mathematics.
Universities have been essential in research and development for data science. The Rensselaer Polytechnic Institute recently announced that it intends to build a $100 million center to pursue big data developments and research. The Rensselaer Institute for Data Exploration and Applications (IDEA), will operate as a centralized hub across the university’s five schools, and let students and private investors to benefit from the big data advancements discovered within the state-of-the-art research facility.
“The Rensselaer IDEA will maximize the ability of our researchers to harness the expanding possibilities for discovery and innovation in a data-driven, supercomputer-powered, web-enabled, globally interconnected world,” said Rensselaer President Shirley Ann Jackson. “Educated in this context, with new approaches and analytical capabilities, our students—the next generation of discoverers, innovators, and entrepreneurs—will be better equipped to truly change the world.”
IBM Narrows the Big Data Skills Gap
Though the company’s Academic Initiative, IBM launched a new curricula focused on big data and analytics with the addition of it has added nine new academic collaborations to its more than 1,000 partnerships with universities across the globe. IBM has partnered with Georgetown University, George Washington University, Rensselaer Polytechnic Institute and the University of Missouri, as well as a new addition to IBM’s partnership with Northwestern University. IBM cites statistics from the U.S. Bureau of Labor that predicts a 24 percent increase in demand for professionals with data analytics skills during the next eight years.
“Leaders in business, education and government must take action to foster a new generation of talent with the technical expertise and unique ideas to make the most of this tsunami of Big Data,” said Richard Rodts, Manager of Global Academic Programs, IBM. “To narrow this skills gap, IBM is committed to partnering with universities around the world to provide students with Big Data and analytics curriculum to make an impact in today’s data-driven marketplace.”
This past spring the University of Wisconsin Milwaukee began a new fully online program to deliver a graduate certificate in Business Analytics. After completing a gateway course in Analytic Models for Managers, students choose from remaining courses in business forecasting, web mining and analytics, marketing analytics, database marketing, or business intelligence technologies and solutions. Additionally they will have group data projects with adequate exposure to software tools such as SAS, IBM SPSS and Python.
Numerous other universities have seen the benefit from offering degree programs in big data and analytics. DataInformed maintains a map of the various programs across the United States. Swami Chandrasekaran built a Metromap visualization of the data scientist curriculum – covering statistics, programming, machine learning, natural language processing, data visualization, big data, data ingestion, data munging, and toolbox.
The market for Data Scientists
Kaggle is a platform for data prediction competitions and a community of data scientists that meet and compete with each other to solve complex data science problems. Its clearing house of big data competitions matches big data challenges from big name companies with a community of over 100,000 data scientists. For instance -GE’s Flight Quest challenge asks data scientists to use provided data to develop a usable and scalable algorithm that delivers a real-time flight profile to the pilot, helping them make flights more efficient and reliably on time. GE will hand out awards totaling $250,000 for the project.
Kaggle’s Chief Scientist Jerremy Howard told Fast Company that the predominant attribute of the data scientists competing for challenges is not a PhD, but creativity – and Coursera, an online education site that partners with top universities around the world. These DIY Data Scientists are ranked according to the competitions that they have won, with prizes ranging all the way up to $3 million.
Like any other job in technology, another key trait that the data scientist must have, is the ability to adapt with ever-changing landscape of technology, tools and trends in the industry. No matter which angle the data scientist approaches from (business, technical, creative), there is no doubt that the demand is present for the analytical skillset of those willing to take on big data challenges.
Pretty nice roundup!
In Germany there are first two-day-apprenticeships which certify you as a Data Scientist. Are there comparable developments in other countries?
GuyPosted August 23rd, 2013
You can’t really train a data scientist. You can teach the technical skills in stats and programming, but the amount of interdisciplinary crossover necessary requires a career’s worth of knowledge and skill accumulation that you just can’t get out of some 2 year Masters program. The hypothesis-driven nature of experimentation in data science also leans heavily towards the need for a phd of some sort.
Data 101Posted August 23rd, 2013
John, nice article! We are seeing an increase in businesses seeking specialized skills to help address challenges that arose with the era of big data. The HPCC Systems platform from LexisNexis helps to fill this gap by allowing data analysts themselves to own the complete data lifecycle. Designed by data scientists, ECL is a declarative programming language used to express data algorithms across the entire HPCC platform. Their built-in analytics libraries for Machine Learning and BI integration provide a complete integrated solution from data ingestion and data processing to data delivery. More at http://hpccsystems.com