Wednesday, September 21, 2016

Seminar by Stefano Ceri on Thursday September 29th at Télécom ParisTech

Stefano Ceri, Professor at Politecnico di Milano, will give a talk on Thursday, September 29th, 2016, 14:30, in Amphi Jade, Télécom ParisTech, 46 rue Barrault (Paris 13).

Data-Driven Genomic Computing

Abstract

Genomic computing is a new science focused on understanding the functioning of the genome, as a premise to fundamental discoveries in biology and medicine. Next Generation Sequencing (NGS) allows the production of the entire human genome sequence at a cost of about 1000 US $; many algorithms exist for the extraction of genome features, or "signals", including peaks (enriched regions), mutations, or gene expression (intensity of transcription activity). The missing gap is a system supporting data integration and exploration, giving a “biological meaning” to all the available information; such a system can be used, e.g., for better understanding cancer or how environment influences cancer development.
The GeCo Project (Data-Driven Genomic Computing, ERC Advanced Grant currently undergoing the contract preparation) has the objective or revisiting genomic computing through the lens of basic data management, through models, languages, and instruments; the research group of DEIB is among the few which are centering their focus on genomic data integration. Starting from an abstract model, we already developed a system that can be used to query processed data produced by several large Genomic Consortia, including Encode and TCGA; the system employs internally the Spark, Flink, and SciDB data engines, and prototypes can already be accessed from Cineca servers or be downloaded from PoliMi servers. During the five-years of the ERC project, the system will be enriched with data analysis tools and environments and will be made increasingly efficient.
Most diseases have a genetic component, hence a system which is capable of integrating “big data” of genomics is of paramount importance. Among the objectives of the project, the creation of an “open source” system available to biological and clinical research; while the GeCo project will provide public services which only use public data (anonymized and made available for secondary use, i.e., knowledge discovery), the use of the GeCo system within protected clinical contexts will enable personalized medicine, i.e. the adaptation of therapies to specific genetic features of patients. The most ambitious objective is the development, during the 5-years ERC project, of an “Internet for Genomics”, i.e. a protocol for collecting data from Consortia and individual researchers, and a “Google for Genomics”, supporting indexing and search over huge collections of genomic datasets.

Bio

Stefano Ceri about himself:
I am professor of Database Systems at the Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) of Politecnico di Milano. I was visiting professor at the Computer Science Department of Stanford University (1983-1990). I was the chairman of the Computer Science Section of DEI (1992-2004), and the chairman of LaureaOnLIne, a fully online curriculum in Computer Engineering (2004-2008).
I was the director of Alta Scuola Politecnica, the school of excellence for top-level master students selected from Engineering, Architecture, and Design Faculties of Politecnico di Milano and Politecnico di Torino (October 2010 - September 2013).
I was associate editor of ACM-Transactions on Database Systems and IEEE-Transactions on Software Engineering, and I am currently an associated editor of several international journals. I am co-editor in chief (with Mike Carey) of the book series "Data Centric Systems and Applications"(Springer-Verlag).
I am a member of the Executive Committee of ALFC - Associazione Lombarda Fibrosi Cistica (April 2013 - April 2016).
I am the recipient of the ACM-SIGMOD "Edward T. Codd Innovation Award" (New York, June 26, 2013). I am an ACM Fellow and a member of Academia Europaea.

Wednesday, April 13, 2016

Seminar by Paolo Papotti on Monday April 18th at Télécom ParisTech

Paolo Papotti, Assistant Professor at Arizona State University, will give a talk on Monday, April 18th, 2016, 15:00, in Amphi Rubis, Télécom ParisTech, 46 rue Barrault (Paris 13).

Data Cleaning in the Big Data era

Abstract

In the “big data” era, data is often dirty in nature because of several reasons, such as typos, missing values, and duplicates. The intrinsic problem with dirty data is that it can lead to poor results in analytic tasks. Therefore, data cleaning is an unavoidable task in data preparation to have reliable data for final applications, such as querying and mining. Unfortunately, data cleaning is hard in practice and it requires a great amount of manual work. Several systems have been proposed to increase automation and scalability in the process. They rely on a formal, declarative approach based on first order logic: users provide high-level specifications of their tasks, and the systems compute optimal solutions without human intervention on the generated code. However, traditional ‘top-down’ cleaning approaches quickly become unpractical when dealing with the complexity and variety found in big data. In this talk, we first describe recent results in tackling data cleaning with a declarative approach. We then discuss how this experience has pushed several groups to propose new systems that recognize the central role of the users in cleaning big data.

Bio

Paolo Papotti is an Assistant Professor of Computer Science in the School of Computing, Informatics, and Decision Systems Engineering (CIDSE) at Arizona State University. He got his Ph.D. in Computer Science at Universita’ degli Studi Roma Tre (2007, Italy) and before joining ASU he had been a senior scientist at Qatar Computing Research Institute.

His research is focused on systems that assist users in complex, necessary tasks and that scale to large datasets with efficient algorithms and distributed platforms. His work has been recognized with two “Best of the Conference” citations (SIGMOD 2009, VLDB 2015) and with a best demo award at SIGMOD 2015. He is group leader for SIGMOD 2016 and associate editor for the ACM Journal of Data and Information Quality (JDIQ).

Tuesday, February 2, 2016

Seminar by Meghyn Bienvenu on February 26, 2016

Meghyn Bienvenu (CNRS/U. Montpellier, http://www.lirmm.fr/~meghyn/) will present her tutorial on "Ontology-Mediated Query Answering"
(http://www.csw.inf.fu-berlin.de/rw2015/lecturers.html#QueryAnswering)

When: Friday 26/2/2016, from 10 am to 12 am and from 2 pm to 4 pm
Where: Salle Gilles Kahn
Inria Saclay Île-de-France
Bâtiment Alan Turing
1 rue Honoré d'Estienne d'Orves
Campus de l'École Polytechnique
91120 Palaiseau
Coordonnées GPS :
+48° 42' 52.11", +2° 12' 20.78"
How to get there:
http://www.inria.fr/en/centre/saclay/overview/practical-info/how-to-reach-the-centre

The closest RER station is Lozère (RER B). If you need help getting from Lozère to the seminar, contact me (ioana.manolescu@inria.fr).

Wednesday, January 13, 2016

Seminars by Julia Stoyanovich and Benny Kimelfeld at Télécom ParisTech (21 January 2016)

Julia Stoyanovich (Drexel University) and Benny Kimelfeld (Technion) will give talks on 21 January 2016 at Télécom ParisTech, 46 rue Barrault, Paris, 14:00 in Amphi Saphir.

Portal: A query language for evolving graphs

Julia Stoyanovich, Drexel University, Philadelphia, PA, U.S.A.

Graphs are used to represent a plethora of phenomena, from the Web and social networks, to biological pathways, to semantic knowledge bases. Arguably the most interesting and important questions one can ask about graphs have to do with their evolution. Which Web pages are showing an increasing popularity trend? How does influence propagate in social networks? How does knowledge evolve? In this talk I will present Portal, a declarative language for efficient querying and exploratory analysis of evolving graphs. I will describe an implementation of Portal in scope of Apache Spark, an open-source distributed

data processing framework, and will demonstrate that careful engineering can lead to good performance. Finally, I will describe our work on a visual query composer for Portal.

Julia Stoyanovich is an Assistant Professor of Computer Science at the College of Computing and Informatics at Drexel University (Philadelphia, USA). Prior to joining Drexel, she was a Postdoctoral researcher and an NSF/CRA Computing Innovations Fellow at the University of Pennsylvania. Julia received her MS and PhD degrees in Computer Science at Columbia University (New York, USA) in 2003 and 2009, respectively, and her BS in Computer Science and in Mathematics and Statistics at the University of Massachusetts Amherst, USA in 1998. Having graduated from college, Julia spent 5 years in the start-up industry, as a software developer, data architect and database administrator. This experience has motivated her to work with real datasets whenever possible, and to deliver results of her research to the communities of target users, as part of open-source systems or as stand-alone prototypes. Julia's research is in the area of data and knowledge management. Her focus is on developing novel information discovery approaches, with the goal of helping the user identify relevant information, and ultimately transform that information into knowledge. She has recently worked with a wide variety of real datasets, from shopping, dating and collaborative tagging applications, to full-genome association studies and gene expression microarrays, to data-intensive workflows and scientific articles. For more information, see https://www.cs.drexel.edu/~julia/

Database Principles in the Wild

Benny Kimelfeld, Technion, Haifa, Israel

Modern technological and social trends, such as mobile computing, blogging, and social networking, produce an enormous amount of often valuable data. At the same time, the means to analyze such data are becoming more accessible with the popularity of business models like cloud computing, open source and crowd sourcing. But such data pose challenges to traditional database paradigms. Due to the uncontrolled nature by which data is produced, much of it is free text, often in informal natural language, leading to computing environments with high levels of uncertainty and error. In this talk I will describe principled research that I have been pursuing towards systems that facilitate modern data-centric development by unifying key functionalities of databases, text analytics, machine learning and artificial intelligence.

Benny Kimelfeld is an Associate Professor in the Computer Science Faculty at Technion, Israel. After receiving his Ph.D. from The Hebrew University of Jerusalem, he has been a Research Staff Member at IBM Research Almaden, and a Computer Scientist at LogicBlox. Benny's research spans a spectrum of both foundational and systems aspects of data management, such as probabilistic and inconsistent databases, information retrieval over structured data, and infrastructure for text analytics. Benny was an invited tutorial speaker at PODS 2014 and a co-chair of the first SIGMOD/PODS workshop on Big Uncertain Data (BUDA). He is a co-chair of the 2016 Web and Databases Workshop (WebDB'16), and he currently serves as an associate editor in the Journal of Computer and System Sciences (JCSS). enny is a Taub Fellow at Technion, and his research is funded by the Israel Science Foundation (ISF), the United States - Israel Binational Science Foundation (BSF), and DARPA. For more information, see http://www.cs.technion.ac.il/people/bennyk/

Wednesday, December 9, 2015

Jennifer Widom's talk at Télécom ParisTech (28th January 2016)

Seminar – Jennifer Widom, Stanford University

Three Favorite Results

Thursday, January 28th 2016 at Telecom ParisTech, 46 rue Barrault, 75013 Paris
Amphi B 312 – 10:00 am.

Registration is free but compulsory, by filling a form at https://bdmi.wp.mines-telecom.fr/2015/12/03/seminar-jennifer-widom-stanford-university/

Conventional wisdom says good things come in threes. As an exercise recently, I reflected on the research I’ve conducted over my career to date and selected my three favorite results, which I will cover in this talk. For each one I’ll explain the context and motivation, the result itself, and why it ranks as one of my favorites. I’ll also make an attempt to decipher what the results have in common. The three results span computer science foundations, system implementation, and user interface questions, and they represent three of my favorite research areas: semistructured data, data streams, and uncertain data.

Jennifer Widom is the Fletcher Jones Professor of Computer Science and Electrical Engineering at Stanford University, and the Senior Associate Dean for Faculty and Academic Affairs in Stanford’s School of Engineering. She served as chair of the Computer Science Department from 2009-2014. Jennifer received her Bachelor’s degree from the Indiana University Jacobs School of Music in 1982 and her Computer Science Ph.D. from Cornell University in 1987. She was a Research Staff Member at the IBM Almaden Research Center before joining the Stanford faculty in 1993. Her research interests span many aspects of nontraditional data management. She is an ACM Fellow and a member of the National Academy of Engineering and the American Academy of Arts & Sciences; she received the ACM-W Athena Lecturer Award in 2015, the ACM SIGMOD Edgar F. Codd Innovations Award in 2007, and a Guggenheim Fellowship in 2000. She has served on a variety of program committees, advisory boards, and editorial boards.

Paris Big Data Management Summit 2016

March 24th, 2016, Paris, France

—————————————————————————————————————

Venue: Télécom ParisTech

46, rue Barrault - 75013 Paris

Website: http://paris-bigdata.org/

This is a call for submission and participation for the Inaugural Paris Big Data Management Summit, to be held on March 24, 2016. The goal of this all-day summit is to bring together researchers from the greater Paris area with an interest in big data management, together with select industry experts, to discuss our collective research strengths and look for opportunities for future collaborations. This summit will show case a number of research projects of high relevance and impact, and present a plenary student poster session to broadly cover projects on big data management in the local area. Attendees will also hear from French industry about their data management needs.

Call for Submission

———————————————————

We call for submission from all researchers and graduate students in the greater Paris area and other select areas in France. The topics of interest include, but are not limited to:

- Databases,

- Data mining,

- Web data management,

- Knowledge management, and

- Broader big data analytics.

We call for submissions from researchers and students in one of the two forms:

- Technical talk: Each technical talk is given 15 minutes for presentation, and there will be 6-9 talks at the summit. To submit for a technical talk, we require a 2 page maximum PDF talk abstract (any format, 10 pt font or larger). The organizing committee will review the talk abstracts to make the final selection.

- Poster: We anticipate every research project on big data management in the local area to be presented with a poster. It is a great opportunity for the graduate students on the project to present their ideas and latest results. For a poster submission, we require a 1 paragraph abstract. All posters on related topics will be accepted.

All presentations must be made in English due to the presence of international participants.

For submission, please visit the following web page:

http://paris-bigdata.org/call.html

Important Dates

——————————————————

Paper or poster submission: January 8, 2016

Author notification: January 22, 2016

Registration deadline: February 29, 2016 (registration may be closed earlier when reaching the maximum

capacity of the summit venue)

Summit date: March 24, 2016

All deadlines are 23:59, Paris time, on the due day.

Event details

——————————————————

The 2016 Paris Big Data Management Summit will be held on March 24, 2016, from 8:30 am to 6:00 pm. The program will consist of keynote speeches, a number of technical talks from you, the participants, and from our industry partners, and finally, a large student poster session and a cocktail social!

Registration is free. Lunch, drinks, and appetizers will be provided.

The event will be held at Telecom ParisTech, located at 46, rue Barrault - 75013 Paris. Transportation options include:

- Metro : Take the line 6 and stop at Corvisart station

- RER : Take the RER B until Denfert-Rochereau station. Connection in Denfert Rochereau and take the Metro line 6

- Bus : Line 62 (Vergniaud), 21 (Daviel) or 67 (Bobillot)

- Vélib' : Stations 13022 (27 & 36, rue de la Butte aux Cailles), 13048 (20, rue Wurtz) or 13024 (81, rue Bobillot)

- Autolib' : 245, rue de Tolbiac - 189, rue de Tolbiac - 50, bd. Blanqui

——————————————————

For more information, please visit our website:

http://paris-bigdata.org/

Monday, June 1, 2015

Hubie Chen: "One Hierarchy Spawns Another: Graph Deconstructions and the Complexity Classification of Conjunctive Queries", LSV ENS Cachan, June 11th 2015, 10.30 am

Name: Hubie Chen
Title: One Hierarchy Spawns Another: Graph Deconstructions and the Complexity Classification of Conjunctive Queries
Date and time: Thursday, June 11th 2015, at 10.30 am
Location: LSV library, ENS Cachan http://www.lsv.ens-cachan.fr/

Abstract:
We study the classical problem of conjunctive query evaluation, here restricted according to the set of permissible queries. In this work, this problem is formulated as the relational homomorphism problem over a set of structures A, wherein each instance must be a pair of structures such that the first structure is an element of A. We present a comprehensive complexity classification of these problems, which strongly links graph-theoretic properties of A to the complexity of the corresponding homomorphism problem. In particular, we define a binary relation on graph classes and completely describe the resulting hierarchy given by this relation. This binary relation is defined in terms of a notion which we call graph deconstruction and which is a variant of the well-known notion of tree decomposition. We then use this graph hierarchy to infer a complexity hierarchy of homomorphism problems which is comprehensive up to a computationally very weak notion of reduction, namely, a parameterized form of quantifier-free reductions. We obtain a significantly refined complexity classification of left-hand side restricted homomorphism problems, as well as a unifying, modular, and conceptually clean treatment of existing complexity classifications, such as the classifications by Grohe-Schwentick-Segoufin (STOC 2001) and Grohe (FOCS 2003, JACM 2007).

In this talk, we will also briefly discuss parameterized complexity classes that we introduced/studied which capture some of the complexity degrees identified by our classification.

This talk is based on joint work with Moritz Mûller that appeared in PODS ’13 and CSL-LICS ’14.