When: Monday, February 2, at 14:00
Where: PCRI building, room 445
Who: Paolo Papotti
Title: Beyond declarative mapping and cleaning
Abstract:
In
the "big data" era, data integration is a popular activity both in
academia and in industry. Integrating hundreds of heterogeneous sources
on a daily basis requires a great amount of manual work in order to have
data that is polished enough to be useful in the final applications,
such as querying and mining. The problem is ever harder in practice, as
data is often dirty in nature because of typos, duplicates, and so on,
that can lead to poor results in the analytic tasks.
Over
the last ten years, several successful systems have been proposed to
tackle this challenge with a formal, declarative approach based on first
order logic. However, despite the positive results, there is still a
gap between these proposals and the leading commercial systems. The
latter are harder to maintain, to debug, and to test, but provide the
level of personalization and detail that are needed to solve
“real-world” problems. In this talk, I will describe some of my results
in tackling mapping and cleaning with a declarative approach, and how
this experience has pushed me to explore a new way that can take the
best of both worlds.
Short Bio:
Paolo Papotti is a scientist in the Data Analytics center at Qatar Computing Research Institute (QCRI). He holds a Ph.D degree in computer
science from Roma Tre University (Italy, 2007), where he also was
Assistant Professor before joining QCRI. He had visiting appointments at
IBM Almaden (USA) and at the UC Santa Cruz (USA). His research topics
are in the general area of information integration and data quality.
No comments:
Post a Comment