Thursday, March 12, 2015

Yanlei Diao: Big Data Analytics for Large-Scale Scientific Applications

Who: Yanlei Diao
When: March 17, 14:30
Where: Univ. Paris Sud, bâtiment 660, amphi Claude Shannon, Rue Noetzlin, 91190 Gif-sur-Yvette
(The 660 building on Google Maps)
See also access to PCRI.

Title: Big Data Analytics for Large-Scale Scientific Applications

Abstract:

As scientific applications are producing data at an unprecedented rate, they have become a main driving force of the big data field.
Meanwhile, intelligent, scalable data management has become crucial to large-scale scientific applications such as computational astrophysics and genomics.  In this talk, I present our recent work on platform and algorithm design to support such applications.

First, I show how we design a new storage system, Claro, based on the recently proposed array model, to store and process scientific data that are inherently noisy and uncertain. We propose a suite of storage and evaluation strategies to support array operations under data uncertainty. Results from Sloan Digital Sky Survey (SDSS) datasets show that our techniques outperform state-of-the-art index
methods by 1.7x-4.3x for the Subarray operation and 1-2 orders of magnitude for Structure-Join.

Second, motivated by the needs of low-latency genomic data processing, I present our design of a “big and fast” data analytics system, Scalla.
Scalla achieves scalability and low-latency (real-time) of processing in a unified system by seamlessly integrating data parallelism, incremental processing, and distributed resource planning. Scalla outperforms existing fast data systems by 1-2 orders of magnitude in throughput and latency combined. Finally, I show some initial results of applying Scalla in the genomics domain.

Bio:

Yanlei Diao is Associate Professor of Computer Science at the University of Massachusetts Amherst. Her research interests are in information architectures and data management systems, with a focus on
big data analytics, scientific analytics, data streams, uncertain data management, and RFID and sensor data management. She received her PhD in Computer Science from the University of California, Berkeley in 2005, her M.S. in Computer Science from the Hong Kong University of Science and Technology in 2000, and her B.S. in Computer Science from Fudan University in 1998.

Yanlei Diao was a recipient of the 2013 CRA-W Borg Early Career Award (one female computer scientist selected each year), IBM Scalable Innovation Faculty Award, and NSF Career Award, and she was a finalist of the Microsoft Research New Faculty Award.
She spoke at the Distinguished Faculty Lecture Series at the University of Texas at Austin. Her PhD dissertation “Query Processing for Large-Scale XML Message Brokering” won the 2006 ACM-SIGMOD Dissertation Award Honorable Mention.
She is currently Editor-in-Chief of the ACM SIGMOD Record, Associate Editor of ACM TODS, Area Chair of SIGMOD 2015, and member of the SIGMOD Executive Committee and SIGMOD Software Systems Award Committee.
In the past, she has served as Associate Editor of PVLDB, organizing committee member of SIGMOD, CIDR, DMSN, and the New England Database Summit, as well as on the program
committees of many international conferences and workshops.
Her research has been strongly supported by industry with awards from Google, IBM, Cisco, NEC labs, and the Advanced Cybersecurity Center.

No comments:

Post a Comment