A team of University of Alabama graduate students took home top honors at the 2013 SAS Analytics and Data Mining Shootout, which tests students in their ability to wrangle and tame what Michael Adams, professor of statistics and one of the team’s sponsors, called “massive amounts of data.”
“You’re trying to understand a phenomenon and make some various important decisions. It’s a really messy environment in which to work,” Adams said. “It’s like analytical detective work.”
Adams said the role of statisticians and business analysts is to match and merge information in a variety of formats from a variety of sources.
The process begins with a team assessing what is being asked of them. In the competition, the UA team was asked to review hypothetical health campaigns in the state of New Hampshire and rank their cost-effectiveness.
After that, students began working with data that Adams described as “terribly dirty” – often missing, inconsistent or incomplete.
“A lot of time was spent just getting that data ‘cleaned’ and ready to put into a model,” said Kevin Crandall, a team member who was working on an MBA with a concentration in analytics and is currently a law student at Harvard University.
He called the experience useful and applicable to his line of work, especially as big data becomes more commonplace.
The next part of the process was designing mathematical models and choosing which to use. Xuwen Zhu, a graduate student in statistics, said the model choice was the source of much discussion, but the team was ultimately able to work together and choose the model they felt was best.
“I think in statistical modeling, you can never say that your model is the best,” Zhu said. “You can always improve. But given the time period, I think we [did] our best.”
Model selection was one of the six categories on which submissions were judged. However, Rong Zheng, also a graduate student in statistics, said completion of the project offers no closure when it comes to model selection.
“We didn’t know if it was the right one or the wrong one or if it would make sense in the judges’ eyes,” she said. “At that moment, we only can tell that we’ve finished it.”
Still, not all teams reach that moment of completion. Adams said only 26 of the 62 registered teams submitted a complete project.
“Just submitting a solution was quite an accomplishment,” he said. “Not only did we have the first-place team, we had another team that was in the top six.”
The final step of the project, and another category in which the team was judged, was communication, as determined by a final presentation in Orlando, Fla.
It was the chance to go on that trip that initially lured Zheng and Zhu into staying an extra month during the summer to complete what had been a semester-long endeavor. But, in the end, the team members point to more intangible rewards than a Florida vacation.
“There was a lot of searching and trying to learn new things. It was a very good experience,” Semhar Michael, a statistics graduate student on the team, said. “It was a very good hands-on experience to have for the future job market.”
Adams said the problem posed to the competing teams was deliberately challenging and realistic.
“[It was designed to] take you as far from a textbook problem as you can imagine,” he said. “The point of this competition [was] to find out which universities are best preparing students to handle those kind of complex, challenging problems.”