Thursday, January 29, 2015

Nicoleta Preda: ANGIE in Wonderland, Friday, February 13, 2 pm, PCRI, room 445


When: Friday, February 13, at 14:00

Where: PCRI building, room 445 (address)

Who: Nicoleta Preda

Title: ANGIE in wonderland

Abstract:
In recent years, several important content providers such as Amazon,Musicbrainz, IMDb, Geonames, Google, and Twitter, have chosen to
export their data through Web services. To unleash the potential of
these sources for new intelligent applications, the data has to be
combined across different APIs.
To this end, we have developed ANGIE, a framework that maps the
knowledge provided by Web services dynamically into a local knowledge
base. ANGIE represents Web services as views with binding patterns
over the schema of the knowledge base. In this talk, I will focus on
two problems related to our framework.

In the first part, the focus will be on the automatic integration of
new Web services. I will present a novel algorithm for inferring the
view definition of a given Web service in terms of the schema of the
global knowledge base. The algorithm also generates a declarative
script can transform the call results into results of the view. Our
experiments on real Web services show the viability of our approach.

The second part will address the evaluation of conjunctive queries
under a budget of calls. Conjunctive queries may require an unbound
number of calls in order to compute the maximal answers. However, Web
services typically allow only a fixed number of calls per session.
Therefore, we have to prioritize query evaluation plans. We are working 
on distinguishing among all plans that could return answers those plans
that actually will. Finally, I will show an application for this new notion of plans. 

Short Bio:
Nicoleta Preda obtained her Ph.D. in computer science from the University Paris-Sud under the supervision of Serge Abiteboul and Ioana Manolescu. Before joining the University of Versailles in 2010, she was a post-doctoral researcher in the database group led by Gerhard Weikum at the Max Planck Institute for Informatics. Her research interests include the enrichment of KBs with dynamic data, rule mining, and querying large repositories of semi-structured data. Nicoleta teaches classes on data integration, database systems, XML technologies, and Web services.

Monday, January 19, 2015

Paolo Papotti: Beyond declarative mapping and cleaning, Feb 2, 2015, 2 pm, PCRI, room 445

When: Monday, February 2, at 14:00

Where: PCRI building, room 445

Who: Paolo Papotti

Title: Beyond declarative mapping and cleaning

Abstract:
In the "big data" era, data integration is a popular activity both in academia and in industry. Integrating hundreds of heterogeneous sources on a daily basis requires a great amount of manual work in order to have data that is polished enough to be useful in the final applications, such as querying and mining. The problem is ever harder in practice, as data is often dirty in nature because of typos, duplicates, and so on, that can lead to poor results in the analytic tasks.

Over the last ten years, several successful systems have been proposed to tackle this challenge with a formal, declarative approach based on first order logic. However, despite the positive results, there is still a gap between these proposals and the leading commercial systems. The latter are harder to maintain, to debug, and to test, but provide the level of personalization and detail that are needed to solve “real-world” problems. In this talk, I will describe some of my results in tackling mapping and cleaning with a declarative approach, and how this experience has pushed me to explore a new way that can take the best of both worlds.

Short Bio:
Paolo Papotti is a scientist in the Data Analytics center at Qatar Computing Research Institute (QCRI). He holds a Ph.D degree in computer science from Roma Tre University (Italy, 2007), where he also was Assistant Professor before joining QCRI. He had visiting appointments at IBM Almaden (USA) and at the UC Santa Cruz (USA). His research topics are in the general area of information integration and data quality.

Tuesday, January 6, 2015

Yanlei Diao: "Supporting Scientific Analytics under Data Uncertainty and Query Uncertainty", PCRI, Jan 16, 2015, 10 am

Title: Supporting Scientific Analytics under Data Uncertainty and Query Uncertainty
 
Location: PCRI (https://www.lri.fr/info.pratiques.php), room 455

Date and time: January 16, 2015, 10 am
Abstract:

Data management is becoming increasingly important in large-scale scientific applications such as computational astrophysics, severe weather monitoring, and genomics.  In this talk, I present our recent work to address two major challenges raised by those scientific applications. The first challenge regards “data uncertainty”, due to the fact that scientific measurements are inherently noisy and uncertain. In particular, we address uncertain data management under the array model, which has gained popularity for large-scale scientific data processing due to performance benefits. We propose a suite of storage and evaluation strategies to support array operations under data uncertainty. Results from Sloan Digital Sky Survey (SDSS) datasets show that our techniques outperform state-of-the-art methods by 1.7x to 4.3x for the Subarray operation and 1 to 2 orders of magnitude for Structure-Join.
As scientific data continues to grow in size and diversity, it is becoming harder for the user to express her data interests precisely in a formal language like SQL. We refer to this second problem as “query uncertainty. This leads to a strong need for “interactive data exploration,” a service that efficiently navigates the user through a large data space to identify the objects of interest. We present our initial work on interactive data exploration, with results suggesting that it is possible to predict user interests modeled by conjunctive queries with a small number of samples, while providing interactive performance.

Bio:
Yanlei Diao is Associate Professor of Computer Science at the University of Massachusetts Amherst. Her research interests are in information architectures and data management systems, with a focus on big data analytics, scientific analytics, data streams, uncertain data management, and RFID and sensor data management. She received her PhD in Computer Science from the University of California, Berkeley in 2005, her M.S. in Computer Science from the Hong Kong University of Science and Technology in 2000, and her B.S. in Computer Science from Fudan University in 1998. 

Yanlei Diao was a recipient of the 2013 CRA-W Borg Early Career Award (one female computer scientist selected each year), IBM Scalable Innovation Faculty Award, and NSF Career Award, and she was a finalist of the Microsoft Research New Faculty Award. She spoke at the Distinguished Faculty Lecture Series at the University of Texas at Austin. Her PhD dissertation “Query Processing for Large-Scale XML Message Brokering” won the 2006 ACM-SIGMOD Dissertation Award Honorable Mention. She is currently Editor-in-Chief of the ACM SIGMOD Record, Associate Editor of ACM TODS, Area Chair of SIGMOD 2015, and member of the SIGMOD Executive Committee and SIGMOD Software Systems Award Committee. In the past, she has served as Associate Editor of PVLDB, organizing committee member of SIGMOD, CIDR, DMSN, and the New England Database Summit, as well as on the program committees of many international conferences and workshops. Her research has been strongly supported by industry with awards from Google, IBM, Cisco, NEC labs, and the Advanced Cybersecurity Center.

Thursday, January 1, 2015

Access to PCRI

Physical Address

Our building’s physical address is:
Université  Paris-Sud 11
Bâtiment 650 (PCRI)
Rue Noetzlin,91190 Gif-sur-Yvette
France
.

GPS coordinates: 48.712346, 2.168362

Directions using public transportation: two alternatives

  • (Currently most efficient using public transportation: ) Taking RER line B towards Saint-Rémy-lès-Chevreuse, getting off at Massy-Palaiseau then taking a 91.06 bus (either 91.06 B, 91.06 C, never 91.06 A -  91.10 might work: ask the driver if it gets down at the stop) to IUT – Pôle d’Ingénierie and do the last 150m on foot [map]. To find the bus stop in Massy: [map].
  • Taking RER line B towards Saint-Rémy-lès-Chevreuse, getting off at Le Guichet, then either
    • take the bus: Once you get out at Le Guichet (coming from Paris or the Parisian airports), cross under the tracks, exit the station, go along the corner and the café, cross the street, go down the stairs, cross again to the bus station [map]. You must take the bus 9. The bus schedules are available here. Get off at “IUT – Pôle d’ingénierie” (first stop). The bus ride is 4 minutes. 
    • come on foot: Coming from Paris, start by crossing the rails by the underground pass. Then, take Rue de Versailles (perpendicular to the rails) in front of the train station for two blocks. Turn left in Rue de la Colline, which goes uphill. When you reach the end of that street (almost at the top), continue right in Chemin du Bois des Rames. Keep going on that direction into Rue Nicolas Appert. Turn left in Rue d’Arsonval and continue until joining Rue Noetzlin. The nearest bus station is “Moulon“. Assume 25-30 minutes walk depending on your walking speed.