Who: Meng “Jason” Changping, Luis Galárraga
When: May 11, 2015, 5pm
Where: Télécom ParisTech (46 rue Barrault, 75013 Paris), Amphi Jade
Seminar formed of two talks on mining knowledge from knowledge base networks, organized in the setting of Télécom ParisTech research chair on Machine Learning and Big Data.
Who: Luis Galárraga, Télécom ParisTech
Title: Applications of Rule Mining in Knowledge Bases
Abstract: The continuous progress of Information Extraction (IE) techniques has led to the construction of large Knowledge Bases (KBs) containing facts about millions of entities such as people, organizations and places. KBs are important nowadays because they allow computers to understand the real world and are used in multiple domains and applications. Furthermore, the discovery of useful and non-trivial patterns in KBs, known as rule mining, opens the door for multiple applications in the areas of data analysis, prediction and automatic data engineering. In this article we present an overview of our ongoing work on rule mining on KBs and some of its applications. The scale of current KBs as well as their inherent incompleteness and noise make this endeavor challenging.
Who: Meng “Jason” Changping, PhD candidate, Purdue University
Title: Discovering Meta-Paths in Large Heterogeneous Information Networks
Abstract: The Heterogeneous Information Network (HIN) is a graph data model in which nodes and edges are annotated with class and relationship labels. Large and complex datasets, such as Yago or DBLP, can be modeled as HINs. Recent work has studied how to make use of these rich information sources. In particular, meta-paths, which represent sequences of node classes and edge types between two nodes in a HIN, have been proposed for such tasks as information retrieval, decision making, and product recommendation. Current methods assume meta-paths are found by domain experts. However, in a large and complex HIN, retrieving meta-paths manually can be tedious and difficult. We thus study how to discover meta-paths automatically. Specifically, users are asked to provide example pairs of nodes that exhibit high proximity. We then investigate how to generate meta-paths that can best explain the relationship between these node pairs. Since this problem is computationally intractable, we propose a greedy algorithm to select the most relevant meta-paths. We also present a data structure to enable efficient execution of this algorithm. We further incorporate hierarchical relationships among node classes in our solutions. Extensive experiments on real-world HIN show that our approach captures important meta-paths in an efficient and scalable manner.