Wednesday, May 6, 2015

Meng “Jason” Changping, Luis Galárraga: Mining Knowledge from Knowledge Base Networks


Who: Meng “Jason” Changping, Luis Galárraga
When: May 11, 2015, 5pm
Where: Télécom ParisTech (46 rue Barrault, 75013 Paris), Amphi Jade
Seminar formed of two talks on mining knowledge from knowledge base networks, organized in the setting of Télécom ParisTech research chair on Machine Learning and Big Data.

Who: Luis Galárraga, Télécom ParisTech
Title: Applications of Rule Mining in Knowledge Bases
Abstract: The continuous progress of Information Extraction (IE) techniques has  led to the construction of large Knowledge Bases (KBs) containing facts about millions of entities  such as  people, organizations and places. KBs are important nowadays because  they allow  computers to understand the real world and are used in multiple domains and applications. Furthermore, the  discovery of useful and  non-trivial patterns  in  KBs,   known as rule  mining, opens the door for multiple  applications in  the areas of data analysis, prediction and automatic data engineering. In this article we present an overview of our ongoing work on rule mining on  KBs   and some of its applications. The scale of current KBs as well as their inherent incompleteness and noise make this endeavor challenging.
Who: Meng “Jason” Changping, PhD candidate, Purdue University
Title: Discovering Meta-Paths in Large Heterogeneous Information Networks
Abstract: The Heterogeneous Information Network (HIN) is a graph data model in which nodes and edges are  annotated with class  and relationship labels.  Large and complex datasets, such as Yago or DBLP, can be modeled as HINs. Recent work has studied  how to make  use of these  rich information sources.  In particular, meta-paths, which represent sequences of node classes and edge types between two nodes  in a HIN,  have been proposed  for such tasks  as information  retrieval,  decision  making,  and  product   recommendation. Current methods assume meta-paths are found by domain experts. However, in a large and complex HIN, retrieving meta-paths manually can be tedious and difficult.  We  thus  study  how  to  discover  meta-paths  automatically. Specifically, users  are asked  to  provide example  pairs of  nodes  that exhibit high proximity. We then investigate how to generate  meta-paths that can best explain  the relationship  between   these   node   pairs.  Since   this   problem   is computationally intractable, we propose a  greedy algorithm to select  the most relevant  meta-paths. We  also  present a  data structure  to  enable efficient execution of this algorithm. We further incorporate hierarchical relationships among node classes  in our solutions. Extensive  experiments on real-world HIN show that our approach captures important meta-paths  in an efficient and scalable manner.