When the COVID-19 pandemic forced most of the world into lockdown in March 2020, U.S. colleges and universities shifted to online learning in a hurry. While most U.S. college students were already accustomed to hybrid coursework—face-to-face courses usually include online components—educators understood that protracted online learning would erode the quality of higher education and jeopardize continuing and future enrollment.
Creating safe, in-person learning was a top priority for Purdue University. Administrators wanted to get students back to campus as soon as possible. One of the most prestigious public universities in the U.S., Purdue is esteemed for its engineering and computer science programs. Administrators recognized that in-house technological knowhow could yield an effective contact tracing system.
University executives turned to Ian Pytlarz, a Purdue alumnus and lead data scientist in Institutional Data Analytics + Assessment (IDA+A), for ideas. A division of the provost’s office, IDA+A conducts research and statistical and predictive analysis to support evidence-based decisions on everything from enrollment to retention and academic success to campus operations. Pytlarz leads a team of data scientists who focus on analysis, assessment and governance that give Purdue an accurate picture of the state of the campus community.
“We have access to virtually everything,” says Pytlarz. “We have grades, we can view who’s in class with whom, class schedules, every card swipe in every door, every dining swipe, every gym swipe, fraternity and sorority housing records and wireless transactions on our Wi-Fi network.”
Under normal conditions, these large troves of data allow Purdue to do everything universities usually do, from creating class schedules, serving meals in dining halls, securing access to campus facilities, processing grades, and maintaining well-functioning information technology resources. But during the pandemic, Purdue put this data to work to keep students safe.
When the university closed at the beginning of the pandemic, Pytlarz and his team of data scientists were busy. “I knew how serious it was and that we needed to do something big, or this was going to be a mess,” says Pytlarz. With university leadership determined to avoid a long-term shutdown, Pytlarz knew the only way forward was to devise a bold, creative solution. “When we shut down, we began to work on developing what would become one of the most sophisticated digital contact tracing systems in the world.”
To transform these disparate and disconnected data sources into a comprehensive, consent-based map of student behavior and patterns of contact, Pytlarz needed a solution that could reveal relationships among millions of data points. “I knew we could mitigate the threat by understanding who spent time with whom and where,” he says, “and accurately predict where virus transmission might occur.” But making these connections wasn’t going to be easy. Analyzing so many kinds of data demanded an industrial-strength solution.
Pytlarz was already using VMware Greenplum, a massively parallel processing (MPP) database, for other operational projects at the university. It made sense to build on this solution to create the contact tracing system. “We were already using VMware Greenplum as a gray box for all sorts of data at the university,” says Pytlarz. “I knew we could use it as the basis of a contact tracing system because it would allow us to create a solution that would consider all the wireless access point logs to see where and when students were co-located for long periods of time. This meant our executive and medical teams could access all the data they could ever want.”
Pytlarz used Greenplum to design a system that allows a cluster of servers to operate as a single database supercomputer to perform queries exponentially faster. By ingesting and rapidly processing millions of data points generated by data about student interaction and contact, the university could quickly identify and isolate students who tested positive for COVID.
“When we got test results, which came in roughly every hour from our onsite testing facility, that would go into our system, it would pull out every significant contact from that exhaustive list and send that out to the medical team,” says Pytlarz. “And that would all be done within five minutes of our getting a positive test. The medical team always had very up-to-date information on who was at risk, who needed to be tested, and who needed to be pulled out of their dorm and moved to isolation housing,” says Pytlarz. “And we were on top of it through the entire pandemic. Every decision university executives made throughout the pandemic was based on data continually flowing from our systems. And we never had a major outbreak.”
Dissolving information silos to support the development of the contact tracing system helped dissolve other silos, too. “When the pandemic arrived, we started breaking down those silos because the pandemic created a great deal of willingness across the institution,” says Pytlarz. By helping different players across campus create openings between departments and operational systems, IDA+A has found more ways to use data at Purdue to improve other ways the university does business.
“We’re working with our energy and utilities area right now to improve power plant efficiency,” says Pytlarz. “Purdue runs its own power plant. We’re called the boilermakers for a reason. We have a lot of boilers.” Pytlarz and his team of data scientists harnessed the principles they used to develop the contact tracing system to analyze utility efficiency. “Our annual utility budget is about USD $20 million. If we can shave a couple of percentage points off that, we’re talking real money,” he says.
And how does the improvement in power efficiency make life better for students? “For the past 10 years, Purdue has frozen tuition at 2012 rates,” says Pytlarz. “We work every day so students can have a more affordable education so that we can be a good value for them.”