Project SEMINTEC - Semantically-enabled data mining techniques

SEMINTEC - Semantically-enabled data mining techniques (2005-2010)

Project description

The SEMINTEC project aims at the investigation of the techniques of data mining that take into account the background information about the given domain supported in the form of the ontology. The prior knowledge (ontology) may drive the search process into interesting for the given domain directions and prune the hypothesis space, moreover it may give more insight and understanding of the obtained results. In most of the current knowledge discovery methods, the background knowledge is implicit or doesn't have formal structure or semantics and practically can be only considered by the human analyst. We aim at using explicitly provided domain knowledge with a formal, well-defined semantics during the knowledge discovery process. In particular our goals are:

-developing algorithms for Semantic Web Mining,

-developing algorithms for data mining in relational databases with support of the ontologies as a prior knowledge,

-developing algorithms that discover patterns represented in highly expressive languages (subsets of SWRL e.g. OWL DLP, DL-safe rules).

Personel involved

Joanna Józefowska , Ph.D., Dr. Habil., Full Professor

Agnieszka Ławrynowicz , Ph.D. Assistant Professor

Tomasz Łukaszewski , Ph.D. Assistant Professor

Funding

The project was partially funded by the Polish Ministry of Scientific Research and Information Technology/Polish Ministry of Science and Higher Education (under grants number KBN 3T11F 025 28, and N N516 186437).

Downloads

Financial ontology

For our work concerning Semantic Web Mining we need a complete knowledge base, but we faced the problem of a quite few ontologies with assertional component available online whereas it is not difficult to find ontologies with only the terminological component available. Thus we decided to use the existing, known from the PKDD'99 Discovery Challenge, financial dataset (http://lisp.vse.cz/pkdd99/DATA/data_berka.zip ) and on the basis of the relational schema and problem description, we created manually a simple ontology in Protégé . Then we imported the text files from the original dataset into the relational database and we parsed the part of the database into the instances of the ontology and appended this part manually to the ontology file created in Protégé.

Short description of the financial domain

The financial dataset domain describes a bank that offers services (like managing of accounts, offering loans) to private persons. The data describes the accounts of bank clients, the loans granted, the credit cards issued, etc. One client can have more accounts and more clients can manipulate with a single account. To an account more credit cards can be issued, but at most one loan can be granted for. Also some additional demographic data about clients is publicly available like the age, sex or address. More information about the original dataset can be found on http://lisp.vse.cz/pkdd99/Challenge/berka.htm Currently in our ontology w.r.t. the original dataset, we don't have data about transactions and we have not all of the demographic data. The current ontology is in OWL-DLP fragment. In the future we are going to publish here the ontologies in more expressive languages.

The ontology can be downloaded here (3,24 MB). Other, small ontologies based on the financial dataset (only information about gold credit card holders): the one that was used in the experiments for EKAW2006, can be downloaded here and another one, without disjunctions, here.

SEMINTEC-ARM (SEMINTEC-Association Rule Miner)

Source code in Java: semintec_src.zip

Javadoc: doc.zip

Example input file with execution setup: exampleSemintecSetup.xml

The implementation depends on KAON2.

Publications & presentations

Józefowska J., Ławrynowicz A., Łukaszewski T., The role of semantics in mining frequent patterns from knowledge bases in description logics with rules. Theory and Practice of Logic Programming, 10(3):251-289, 2010.
Józefowska J., Ławrynowicz A., Łukaszewski T., A Study of the SEMINTEC Approach to Frequent Pattern Mining, Springer, Studies in Computational Intelligence, Vol. 220/2009, Berlin/Heidelberg, 2009, 37-51
Ławrynowicz A. Frequent Pattern Discovery from Knowledge Bases in Description Logics with Rules, Ph.D. Thesis, 2008, Poznan University of Technology.
Józefowska J., Ławrynowicz A., Łukaszewski T. On Reducing Redundancy in Mining Relational Association Rules from the Semantic Web, The Second International Conference on Web Reasoning and Rule Systems (RR'2008), LNCS 5341 Springer, Karlsruhe, 2008
Józefowska J., Ławrynowicz A., Łukaszewski T. Combining Answer Caching with Smartcall Optimization in Mining Frequent DL-safe Queries, Late Breaking Papers Session, 18th International Conference on Inductive Logic Programming, ILP'2008, Prague
Józefowska J., Ławrynowicz A., Łukaszewski T. Materialized views in mining ontology instances, Proceedings of the Poster Track of the 5th European Semantic Web Conference (ESWC2008), Tenerife (Spain), 2008.
Józefowska J., Ławrynowicz A., Łukaszewski T. A study of the SEMINTEC approach to frequent pattern mining. In Proc. PriCKL 2007, ECML/PKDD'2007 Workshop on Prior Conceptual Knowledge in Machine Learning and Knowledge Discovery, Warszawa, 41-52
Józefowska J., Ławrynowicz A., Łukaszewski T., Frequent pattern discovery in OWL DLP knowledge bases, Lecture Notes in Artificial Intelligence, LNAI 4248, Steffen Staab, Vojtech Svatek (eds.), Managing Knowledge in a World of Networks, EKAW 2006, Podebrady, Czech Republic, 287-302, presentation (.pdf)
Ławrynowicz A., Pattern discovery from ABoxes of OWL DLP knowledge bases, (Poster), Fourth European Summer School on Ontological Engineering and the Semantic Web (SSSW-06), Cercedilla, Spain, July 2006, the best poster award
Ławrynowicz A., Pattern discovery from the ontological layer of the Semantic Web (Poster) , KnowledgeWeb PhD Symposium 2006 (KWEPSY2006) Budva, Montenegro, 17th June 2006, collocated with 3rd European Semantic Web Conference, ESWC'2006, Thanks to Hoppers@KWeb funding
Józefowska J., Ławrynowicz A., Łukaszewski T. Faster frequent pattern mining from the Semantic Web, New Trends in Intelligent Information Processing and Web Mining Proceedings of the International IIS: IIPWM'06 Conference, Advances in Soft Computing, Springer Verlag 2006, , 121-130
Józefowska J., Ławrynowicz A., Łukaszewski T. Towards discovery of frequent patterns in description logics with rules, RuleML 2005 (Rules and Rule Markup Languages for the Semantic Web), Galway, Ireland, 2005, A. Adi, S. Stoutenburg, S. Tabet (eds.), LNCS 3791, Springer Verlag, 84-97

Acknowledgements

For our experiments we used KAON2 engine. We would like to thank Boris Motik for the support.

Institute of Computing Science, Poznan University of Technology, Last modified 6th March 2011