SEMINTEC
- Semantically-enabled
data mining
techniques
(2005-2010)
Project
description
|
The
SEMINTEC project aims at the investigation of the techniques of
data mining that take into account the background information
about the given domain supported in the form of the ontology.
The prior knowledge (ontology) may drive the search process
into interesting for the given domain directions and prune the
hypothesis space, moreover it may give more insight and
understanding of the obtained results. In most of the current
knowledge discovery methods, the background knowledge is
implicit or doesn't have formal structure or semantics and
practically can be only considered by the human analyst. We aim
at using explicitly provided domain knowledge with a formal,
well-defined semantics during the knowledge discovery process.
In particular our goals are:
-developing
algorithms for Semantic Web Mining,
-developing
algorithms for data mining in relational databases with support
of the ontologies as a prior knowledge,
-developing
algorithms that discover patterns represented in highly
expressive languages (subsets of SWRL e.g. OWL DLP, DL-safe
rules).
|
Personel
involved
|
Joanna
Józefowska , Ph.D., Dr. Habil., Full Professor
Agnieszka
Ławrynowicz , Ph.D. Assistant Professor
Tomasz
Łukaszewski , Ph.D. Assistant Professor
|
Funding
|
The
project was partially funded by the Polish Ministry of
Scientific Research and Information Technology/Polish Ministry
of Science and Higher Education (under grants number KBN 3T11F
025 28, and N N516 186437).
|
Downloads
|
Financial
ontology
|
For
our work concerning Semantic Web Mining we need a complete
knowledge base, but we faced the problem of a quite few
ontologies with assertional component available online whereas
it is not difficult to find ontologies with only the
terminological component available. Thus we decided to use the
existing, known from the PKDD'99 Discovery Challenge, financial
dataset (http://lisp.vse.cz/pkdd99/DATA/data_berka.zip
) and on the basis of the relational schema and problem
description, we created manually a simple ontology in Protégé
. Then we imported the text files from the original dataset
into the relational database and we parsed the part of the
database into the instances of the ontology and appended this
part manually to the ontology file created in Protégé.
|
Short
description of the financial domain
The
financial dataset domain describes a bank that offers services
(like managing of accounts, offering loans) to private persons.
The data describes the accounts of bank clients, the loans
granted, the credit cards issued, etc. One client can have more
accounts and more clients can manipulate with a single account.
To an account more credit cards can be issued, but at most one
loan can be granted for. Also some additional demographic data
about clients is publicly available like the age, sex or
address. More information about the original dataset can be
found on http://lisp.vse.cz/pkdd99/Challenge/berka.htm
Currently in our ontology w.r.t. the original dataset, we don't
have data about transactions and we have not all of the
demographic data. The current ontology is in OWL-DLP fragment.
In the future we are going to publish here the ontologies in
more expressive languages.
The
ontology can be downloaded here
(3,24 MB). Other, small ontologies based on the financial
dataset (only information about gold credit card holders): the
one that was used in the experiments for EKAW2006, can be
downloaded here and another one,
without disjunctions, here.
|
SEMINTEC-ARM
(SEMINTEC-Association Rule Miner)
|
Source
code in Java: semintec_src.zip
Javadoc:
doc.zip
Example
input file with execution setup: exampleSemintecSetup.xml
The
implementation depends on KAON2.
|
Publications
& presentations
|
Józefowska
J., Ławrynowicz A., Łukaszewski T., The role of
semantics in mining frequent patterns from knowledge bases in
description logics with rules. Theory and Practice of Logic
Programming, 10(3):251-289, 2010.
Józefowska
J., Ławrynowicz A., Łukaszewski T., A Study of the SEMINTEC
Approach to Frequent Pattern Mining, Springer, Studies
in Computational Intelligence, Vol. 220/2009,
Berlin/Heidelberg, 2009, 37-51
Ławrynowicz
A. Frequent Pattern Discovery from Knowledge Bases in
Description Logics with Rules, Ph.D. Thesis, 2008, Poznan
University of Technology.
Józefowska
J., Ławrynowicz A., Łukaszewski T. On Reducing Redundancy in
Mining Relational Association Rules from the Semantic Web, The
Second International Conference on Web Reasoning and Rule
Systems (RR'2008), LNCS 5341 Springer, Karlsruhe, 2008
Józefowska
J., Ławrynowicz A., Łukaszewski T. Combining Answer Caching
with Smartcall Optimization in Mining Frequent DL-safe
Queries, Late Breaking Papers Session, 18th International
Conference on Inductive Logic Programming, ILP'2008, Prague
Józefowska
J., Ławrynowicz A., Łukaszewski T. Materialized views in
mining ontology instances, Proceedings of the Poster Track of
the 5th European Semantic Web Conference (ESWC2008), Tenerife
(Spain), 2008.
Józefowska
J., Ławrynowicz A., Łukaszewski T. A study of the SEMINTEC
approach to frequent pattern mining. In Proc. PriCKL 2007,
ECML/PKDD'2007 Workshop on Prior Conceptual Knowledge in
Machine Learning and Knowledge Discovery, Warszawa, 41-52
Józefowska
J., Ławrynowicz A., Łukaszewski T., Frequent pattern
discovery in OWL DLP knowledge bases, Lecture Notes in
Artificial Intelligence, LNAI 4248, Steffen Staab, Vojtech
Svatek (eds.), Managing Knowledge in a World of Networks, EKAW
2006, Podebrady, Czech Republic, 287-302, presentation
(.pdf)
Ławrynowicz
A., Pattern discovery from ABoxes of OWL DLP knowledge bases,
(Poster), Fourth European Summer School on Ontological
Engineering and the Semantic Web (SSSW-06), Cercedilla, Spain,
July 2006, the best poster award
Ławrynowicz
A., Pattern
discovery from the ontological layer of the Semantic Web
(Poster) , KnowledgeWeb PhD Symposium 2006 (KWEPSY2006)
Budva, Montenegro, 17th June 2006, collocated with 3rd
European Semantic Web Conference, ESWC'2006, Thanks
to Hoppers@KWeb funding
Józefowska
J., Ławrynowicz A., Łukaszewski T. Faster frequent pattern
mining from the Semantic Web, New Trends in Intelligent
Information Processing and Web Mining Proceedings of the
International IIS: IIPWM'06 Conference, Advances in Soft
Computing, Springer Verlag 2006, , 121-130
Józefowska
J., Ławrynowicz A., Łukaszewski T. Towards discovery of
frequent patterns in description logics with rules, RuleML
2005 (Rules and Rule Markup Languages for the Semantic Web),
Galway, Ireland, 2005, A. Adi, S. Stoutenburg, S. Tabet
(eds.), LNCS 3791, Springer Verlag, 84-97
|
Acknowledgements
|
For
our experiments we used KAON2
engine. We would like to thank Boris
Motik for the support.
|
Institute
of Computing Science, Poznan University of Technology, Last
modified 6th March 2011
|
|