Tutorial: “Semantic data mining for knowledge acquisition”

At K-CAP 2017 (9th International Conference on Knowledge Capture), in Austin, Texas, USA

The tutorial takes place in Room Trinity on the afternoon of Monday 04/12/2017, see schedule of the conference

Presenter: Agnieszka Ławrynowicz. Asst. Prof., Poznan University of Technology

Abstract

Semantic data mining is a data mining approach where domain ontologies are used as background knowledge. The challenge is to mine knowledge encoded in domain ontologies and knowledge graphs in addition to purely empirical data. The tutorial will be largely based on the recently published book entitled “Semantic data mining. An ontology-based approach” by the author of this tutorial.

The tutorial intends to provide a synthetic, unifying view on semantic data mining and its application to knowledge acquisition. The tutorial also specifically aims to present major research challenges arising from pecularities of semantic data mining (proper consideration of the semantics of background knowledge, dealing with Open World Assumption, semantic similarity measures).

The tutorial will also incorporate some of the recent advances in the area, namely ’semantic’ embeddings (embedding ontological background knowledge into neural networks).

The intended paticipants of the tutorial are researchers in the fields of semantic technologies, knowledge engineering, data science, and data mining, and developers of knowledge-based systems and applications.

Overview of content, description of the aims, presentation style, potential/preferred prerequisite knowledge

The tutorial will be largely based on the recently published book entitled “Semantic data mining. An ontology-based approach” [1] by the author of this proposal. It will also incorporate some of the recent advances in the area, namely ’semantic’ embeddings (embedding ontological background knowledge into neural networks), and will show how to apply semantic data mining techniques to knowledge acquisition giving special attention to refinement of knowledge graphs.

The planned (tentative) schedule of the tutorial is as follows:

14:00 – 15:30 Introduction, Semantic Data Mining Tasks and Pecularities (slides)

Introduction (20 min.)
Basics of semantic data mining (20 min.)
- Data mining as search
- Generality relations
- Refinement operators
Tasks of semantic data mining (20 min.)
- Pattern mining quiz
- Concept learning
- Similarity-based methods
Pecularities of semantic data mining (30 min.)
- Dealing with the Open World Assumption quiz
- What is a Truly ’Semantic’ Similarity Measure? quiz

15:30 – 16:00 Coffee break

16:00 – 18:00 Semantic Data Mining for Knowledge Acquisition, Hands-On (slides)

Semantic data mining for knowledge acquisition (60 min.)
- Knowledge graph refinement (mining types, synonymous properties, disjointness, inconsistencies)
- Mining knowledge graphs for enrichment of schemas and ontologies
Hands-on: LeoLOD Swift Linked Data Miner plugin for Protégé (60 min.) https://bitbucket.org/jpotoniec/sldm
Additional endpoint to query DBpedia: https://semantic.cs.put.poznan.pl/blazegraph/sparql

The tutorial will include the typical presentation of the material with the use of a standard projector and slides, interleaved with Q&A, and practical parts such as short quizzes and exercises (hands-on) to be carried out by the participants.

The specific aims of the tutorial are:

to provide a synthetic, unifying view on the field,
to present major research challenges arising from pecularities of semantic data mining (proper consideration of the semantics of background knowledge, dealing with Open World Assumption),
to demonstrate how semantic data mining can be used in practice for knowledge acquisition.

This tutorial assumes as helpful some background knowledge of logical knowledge representation, especially some basics of RDF and OWL. However, it will also be accesible for attendees without background knowledge on these representations. The tutorial does not assume any prerequisite knowledge on data mining.

Technical requirements

The tutorial will include a hands-on session with use of an open source tool, a plugin to Protégé, named LeoLOD Swift Linked Data Miner [3]. Protégé is publicly and freely avaiable and the LeoLOD Swift Linked Data Miner is also publicly available at https://bitbucket.org/jpotoniec/sldm.

Participants should bring their personal laptops.

Motivation on why the topic is of particular interest at this time

Semantic data mining becomes of more interest since knowledge acquisition becomes more and more statistical. Traditional knowledge engineering techniques are being complemented with machine learning, and machine learning methods, in turn, need to properly consume complex representations.

The author of this proposal co-organized the Semantic Data Mining tutorial as part of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML/PKDD) in 2011. In recent years, the topic of data mining with and from ontologies and knowledge graphs has experienced an increased interest in major Semantic Web, knowledge engineering, data mining and artificial intelligence journals and at major conferences.

Some workshop series were largely tackling the topic of semantic data mining such as the Workshop on Knowledge Discovery and Ontologies (KDO), the Workshop on Inductive Reasoning and Machine Learning on the SemanticWeb (IRMLeS), the Workshop on Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery (SoKD), the Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data (Know@LOD), the Workshop on Linked Data for Knowledge Discovery (LD4KD) and others.

Also one of the keynotes at this years International Semantic Web Conference (ISWC 2017), to be given by Prof. Nada Lavrač, is devoted to the topic of semantic data mining. The Semantic Web Journal currently manages a special issue on “Machine Learning for Knowledge Base Generation and Population”. A new journal “Data Science” in its first issue discusses the topic of using symbolic, semantic knowledge in data mining [2].

Relation to the main conference topics

This tutorial is related to several topics of K-CAP 2017, including, in particular:

Extracting knowledge graphs from unstructured/semi-structured and multi-media data,
Statistical analysis from Web data,
Enrichment and cleaning of knowledge graphs and alignment to existing graphs,
Hybrid approaches for knowledge capture combining knowledge engineering and machine learning,
Knowledge acquisition,
Knowledge extraction,
Similarity measurement and Analogy-based reasoning

Presenter’s bio:

Agnieszka Ławrynowicz is Assistant Professor at the Institute of Computing Science at Poznan University of Technology (PUT) where she also did her Ph.D. in Computer Science (with distinction) in 2009. She also holds French-Polish DESS Certificate of Ability to Manage Companies (Poznan University of Economics & University of Rennes 1) obtained from post-graduate studies in 2003. Her research interests include artificial intelligence methods, mostly knowledge representation (ontologies), knowledge discovery and the Semantic Web. Before joining academia, she worked in industry (Empolis, Bridgestone), and also has been a leader of the PUT teams in R&D projects: France Telecom (Orange Labs) on spoken language understanding (2011), Volkswagen (2017) on Big Data analytics of manufacturing data. She had an EU Marie-Curie fellowship within the PERSONET project onWeb mining at the University of Ulster (2004). She was the PUT team leader in EU FP7 project e-LICO (“An e-Laboratory for Interdisciplinary Collaborative Research in Data Mining and Data-Intensive Science”) (2010-2012). She is a Laureate of the Foundation for Polish Science under the POMOST program for the project LeoLOD “Learning and Evolving Ontologies from Linked Open Data” (2013-2015). She is a member of the Polish Artificial Intelligence Society, ECCAI and IAOA. She has initiated and co-organized a series of international workshops on Inductive Reasoning and Machine Learning from the Semantic Web (IRMLeS) co-located with a major Semantic Web conference (ESWC’2009-2011). She has also served as co-organizer at several events related to knowledge engineering: co-chairing the Workshop on Ontology and Semantic Web Patterns (WOP2014, WOP2016), OWL: Experiences and Directions Workshop (OWLED 2015), machine learning tracks at the Extended Semantic Web Conference (ESWC2011,ESWC2014), and serving as the workshops chair at the ESWC2017. She serves as PC member at the major conferences and workshops related to knowledge acquisition: ISWC, ESWC, EKAW, K-CAP,WWW,WOP, OWLED, SEMANTiCS. She is the co-editor of the Special Issue on Machine Learning for Knowledge Base Generation and Population of the Semantic Web Journal. She is a co-chair of the W3C Machine Learning Schema Community Group. She was a tutor at the tutorials on Semantic Data Mining (at ECML/PKDD 2011), Modular Ontology Modeling with Ontology Design Patterns (at ESWC 2017). She has been lecturing on the subjects of semantic technologies, knowledge engineering, computational logics, and data mining for more than 10 years.

References

[1] Agnieszka Ławrynowicz. 2017. Semantic data mining. An ontology-based approach. Studies on the Semantic Web, Vol. 29. IOS Pres/AKA Verlag.

[2] Robert Hoehndorf and Nuria Queralt-Rosinach. [n. d.]. Data science and symbolic AI: Synergies, challenges and opportunities. Data Science ([n. d.]). https://doi.org/10.3233/ds-170004

[3] Jedrzej Potoniec and Agnieszka Lawrynowicz. 2016. A Protege Plugin with Swift Linked Data Miner. In Proceedings of the ISWC 2016 Posters & Demonstrations Track co-located with 15th International Semantic Web Conference (ISWC 2016), Kobe, Japan, October 19, 2016. http://ceur-ws.org/Vol-1690/paper48.pdf