Rice researchers win NSF funds to develop distributed programming for speedier analysis
Rice University scientists have been awarded a National Science Foundation grant to develop distributed programming methods to analyze streaming data.
Computer scientists Ang Chen and Eugene Ng will use a three-year, $1.2 million grant to take advantage of programmable elements in the various components that store and deliver data to customers.
According to Chen, principal investigator on the project, it means the switches, routers and other components that stand between end users and data servers can play a more active part in managing and analyzing big data. It could make data networks faster and more efficient, which would be a boon for financial services, social networks, the “internet of things” and many other applications.
The researchers said the range of programmable elements in data networks has expanded to include not only servers but also interface components, field-programmable gate arrays, application-specific integrated circuits and network topology. “Today, all the processing is done at the server, without any processing or computation along the path. We’re going to try to change that,” said Ng,a professor of computer science and electrical and computer engineering.
“Our vision is to optimize all of these components to achieve a sweet spot in the design space for each application,” said Chen, an assistant professor of computer science and of electrical and computer engineering, who joined Rice in 2017. “We hope to have an approach that can work across different kinds of protocols.”
Ng said common examples of streaming data also include fraud detection, monitors and temperature and other environmental sensors that continuously generate data and send it at high speed to servers from all over the world. “Our challenge is to develop a scalable platform that allows programmers to derive real-time insight from data utilizing the technologies we propose,” he said.
One likely strategy is to intelligently process and reduce data before it reaches servers, Ng said. That could be accomplished by programming components along the path to handle as much computation as they’re able. “That can allow server clusters to pull down more data, because you’re not just moving data for the sake of moving it. You’re processing it and potentially generating a partial answer to your question.
“I think it’s safe to say that there is vast untapped potential in using this emerging hardware for big data processing – and the key word is ’emerging,'” Ng said. “It’s new, so very few people have thought about what it can do.”
The researchers also plan to study how data flows through networks so they can optimize it on the fly. “Sometimes it matters which stuff you perform first,” Chen said. “It’s not just about where programming capabilities exist in the network but also about organization of the network itself.
“So we’re looking at how an underlying physical network can adapt itself and change the network flow to optimize latency,” he said.