Marek Wojciechowski's Publications

Chronological list of papers

Marek Wojciechowski, Tadeusz Morzy: 'Wyszukiwanie podobieństw w bazach danych DNA', Raport Instytutu Informatyki Politechniki Poznańskiej RB-020/96, 1996.
Tomasz Biały, Jerzy Brzeziński, Marek Wojciechowski, Andrzej Roszak: 'System informatyczny szpitala', Materiały konf. INFO-MED, Międzyzdroje, 1998.
Marek Wojciechowski, Maciej Zakrzewicz: 'Itemset Materializing for Fast Mining of Association Rules', Proc. of the 2nd East European Conference on Advances in Databases and Information Systems (ADBIS'98), Poznań, Poland, LNCS 1475, © Springer-Verlag, 1998. (pdf, BibTeX)
Mining association rules is an important data mining problem. Association rules are usually mined repeatedly in different parts of a database. Current algorithms for mining association rules work in two steps. First, the most frequently occurring sets of items are discovered, then the sets are used to generate the association rules. The first step usually requires repeated passes over the analyzed database and determines the overall performance. In this paper, we present a new method that addresses the issue of discovering the most frequently occurring sets of items. Our method consists in materializing precomputed sets of items discovered in logical database partitions. We show that the materialized sets can be repeatedly used to efficiently generate the most frequently occurring sets of items. Using this approach, required association rules can be mined with only one scan of the database. Our experiments show that the proposed method significantly outperforms the well-known algorithms.
Marek Wojciechowski: 'Zalety i wady architektury rozproszonej wykorzystującej migawki tylko do odczytu', Materiały IV konf. PLOUG, Zakopane, 1998. (pdf, BibTeX)
Niniejszy artykuł przedstawia zalety i wady architektury rozproszonej wykorzystującej replikację danych opartą na migawkach tylko do odczytu (ang. read-only snapshots) w stosunku do architektury scentralizowanej oraz rozproszonej bez replikacji. Omówiono w nim kryteria, które należy uwzględnić przy wyborze architektury systemu oraz jej wpływ na funkcjonalność, niezawodność, skalowalność, efektywność przetwarzania i poufność danych. Artykuł jest wynikiem rozważań i analiz prowadzonych podczas pracy nad projektem systemu informatycznego dla szpitali.
Marek Wojciechowski: 'Odkrywanie wzorców zachowań użytkowników WWW', Materiały konf. POLMAN'99, OWN, Poznań, 1999. (pdf, BibTeX)
Niniejszy artykuł poświęcony jest odkrywaniu wzorców zachowań użytkowników WWW poprzez zastosowanie technik eksploracji danych (ang. data mining) do analizy logu serwera WWW. W artykule przedstawiono podstawowe techniki eksploracji danych wraz z przykładami ich zastosowania w stosunku do logu serwera WWW. Szczególny nacisk położony został na praktyczne zastosowania odkrytej wiedzy oraz problemy specyficzne dla analizy logu serwera WWW, które nie występują w przypadku innych źródeł danych.
Juliusz Jezierski, Marek Wojciechowski: 'System informatyczny Zintegrowanego Monitoringu Środowiska Przyrodniczego', Materiały konf. POLMAN'99, OWN, Poznań, 1999. (BibTeX)
Niniejszy artykuł przedstawia założenia i architekturę systemu informatycznego Zintegrowanego Monitoringu Środowiska Przyrodniczego (ZMŚP). System ten składa się z trzech modułów: aplikacji okienkowej będącej zespołem powiązanych ze sobą formatek ekranowych, modułu dostępu do danych w standardzie ODBC oraz modułu pozwalającego na udostępnienie danych w sieci Internet.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Pattern-Oriented Hierarchical Clustering', Proc. of the 3rd East European Conference on Advances in Databases and Information Systems (ADBIS'99), Maribor, Slovenia, LNCS 1691, © Springer-Verlag, 1999. (pdf, BibTeX)
Clustering is a data mining method, which consists in discovering interesting data distributions in very large databases. The applications of clustering cover customer segmentation, catalog design, store layout, stock market segmentation, etc. In this paper, we consider the problem of discovering similarity-based clusters in a large database of event sequences. We introduce a hierarchical algorithm that uses sequential patterns found in the database to efficiently generate both the clustering model and data clusters. The algorithm iteratively merges smaller, similar clusters into bigger ones until the requested number of clusters is reached. In the absence of a well-defined metric space, we propose the similarity measure, which is used in cluster merging. The advantage of the proposed measure is that no additional access to the source database is needed to evaluate the inter-cluster similarities.
Marek Wojciechowski: 'Mining Various Patterns in Sequential Data in an SQL-like Manner',
Proc. of short papers of the 3rd East European Conference on Advances in Databases and Information Systems (ADBIS'99), Maribor, Slovenia, 1999. (pdf, BibTeX)
One of the most important data mining tasks is discovery of frequently occurring patterns in sequences of events. Many algorithms for finding various patterns in sequential data have been proposed recently. Researchers concentrated on different classes of patterns, which resulted in many different models and formulations of the problem. In this paper a uniform formulation of the problem of mining frequent patterns in sequential data is provided together with an SQL-like language capable of expressing queries concerning all classes of patterns. An issue of materializing discovered patterns for further selective analysis is also addressed by introducing a concept of knowledge snapshots.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Clustering of Event Sequences', Technical Report RA-003/99, Poznan University of Technology, 1999.
Marek Wojciechowski: 'Distributing and Replicating Data in Hospital Information Systems', Proc. of the 5th International Conference on Computers in Medicine, Łódź, Poland, 1999. (pdf, BibTeX)
Distributing and replicating data are techniques used to improve performance and reliability of information systems. This paper presents advantages and disadvantages of distributed architectures that are important in case of hospital systems. In the paper, several issues that have to be addressed when planning a distributed architecture for the hospital information system are discussed. The particular emphasis is laid on possible problems that might occur in case of partial failure of a distributed hospital information system.
Bartłomiej Janas, Tadeusz Morzy, Marek Wojciechowski: 'Analiza porównawcza algorytmów odkrywania wzorców sekwencji', Raport Instytutu Informatyki Politechniki Poznańskiej RB-022/99, 1999.
Maciej Kempiński, Daniel Lorenz, Tadeusz Morzy, Marek Wojciechowski: 'Odkrywanie wiedzy w medycznej bazie danych', Raport Instytutu Informatyki Politechniki Poznańskiej RB-023/99, 1999.
Marek Wojciechowski: 'Discovering and Processing Sequential Patterns in Databases', VII. Conference on Extending Database Technology - PhD Workshop, Konstanz, Germany, 2000.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Data Mining Support in Database Management Systems', Proc. of the 2nd International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2000), Greenwich, U.K., LNCS 1874, © Springer-Verlag, 2000. (pdf, BibTeX)
The most popular data mining techniques consist in searching databases for frequently occurring patterns, e.g. association rules, sequential patterns. We argue that in contrast to today's loosely-coupled tools, data mining should be regarded as advanced database querying and supported by Database Management Systems (DBMSs). In this paper we describe our research prototype system, which logically extends DBMS functionality, offering extensive support for pattern discovery, storage and management. We focus on the system architecture and novel SQL-based data mining query language, which serves as the user interface to the system.
Marek Wojciechowski, Maciej Zakrzewicz: 'HASH-MINE: A New Framework for Discovery of Frequent Itemsets', Proceedings of Challenges, Enlarged Fourth East-European Conference on Advances in Databases and Information Systems (ADBIS-DASFAA 2000), Prague, Czech Republic, 2000. (pdf, BibTeX)
Discovery of frequently occurring subsets of items, called itemsets, is the core of many data mining methods. Most of the previous studies adopt Apriori-like algorithms, which iteratively generate candidate itemsets and check their occurrence frequencies in the database. These approaches suffer from serious costs of repeated passes over the analyzed database. To address this problem, we propose a novel method, called HASH-MINE, for reducing database activity of frequent itemset discovery algorithms. The idea of HASH_MINE consists in using hash tables for pruning candidate itemsets. The proposed method requires fewer scans over the source database: the first scan creates hash tables, while the subsequent ones verify discovered itemsets. Its performance improvements have been shown in a series of our experiments.
Marek Wojciechowski: 'Discovering Frequent Episodes in Sequences of Complex Events', Proceedings of Challenges, Enlarged Fourth East-European Conference on Advances in Databases and Information Systems (ADBIS-DASFAA 2000), Prague, Czech Republic, 2000. (pdf, BibTeX)
Data collected in many applications have a form of sequences of events. One of the popular data mining problems is discovery of frequently occurring episodes in such sequences. Efficient algorithms discovering all frequent episodes have been proposed for sequences of simple events associated with basic event types. But in many cases events are described by a set of attributes rather than by just one event type attribute. The solutions handling such complex events proposed so far assume that a user provides a template of episodes to be discovered. This assumption does not allow users to discover all surprising relationships between event attributes. In this paper, we propose extensions to algorithms initially designed for simple events making them capable of handling complex events in the same manner.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Materialized Data Mining Views', Proc. of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000), Lyon, France, LNAI 1910, © Springer-Verlag, 2000. (pdf, BibTeX)
Data mining is a useful decision support technique, which can be used to find trends and regularities in warehouses of corporate data. A serious problem of its practical applications is long processing time required by data mining algorithms. Current systems consume minutes or hours to answer simple queries. In this paper we present the concept of materialized data mining views. Materialized data mining views store selected patterns discovered in a portion of a database, and are used for query rewriting, which transforms a data mining query into a query accessing a materialized view. Since the transformation is transparent to a user, materialized data mining views can be created and used like indexes.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Web Users Clustering', Proc. of the ISCIS 2000 Conference, Istanbul, Turkey, 2000. (pdf, BibTeX)
Web log mining is a new subfield of data mining research. It aims at discovery of trends and regularities in web users' access patterns. This paper presents a new algorithm for automated segmentation of web users based on their access patterns. The results may lead to an improved organization of the web documents for navigational convenience.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Fast Discovery of Sequential Patterns Using Materialized Data Mining Views', Proc. of the ISCIS 2000 Conference, Istanbul, Turkey, 2000. (pdf, BibTeX)
Most data mining techniques consist in discovery of frequently occurring patterns in large data sets. From a user's point of view, data mining can be seen as advanced querying, where each data mining query specifies the source data set and the requested class of patterns. Unfortunately, current data mining systems consume minutes or hours to answer simple queries, which makes them unsuitable for interactive use. In this paper we present the concept of materialized data mining views and their application to fast discovery of sequential patterns. We show how materialized data mining views can be used to optimize processing of sequential pattern queries.
Marek Wojciechowski, Maciej Zakrzewicz: 'Adaptatywne serwery WWW', Materiały VI konf. PLOUG, Zakopane, 2000. (pdf, BibTeX)
Adaptatywne serwery WWW wykorzystują analizę plików logu w celu automatycznej transformacji zawartości i struktury udostępnianych dokumentów. W rezultacie, serwer WWW samodzielnie "dopasowuje" się do oczekiwań użytkownika, "odgadując" jego intencje. W artykule przedstawiono dostępne metody zautomatyzowanej analizy plików logu oraz stosowania znalezionych trendów i korelacji w dynamicznej transformacji dokumentów WWW.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz, Sławomir Dudziak, Bartosz Głowacki, Marcin Gruszka, Radosław Hofman: 'Integracja algorytmów odkrywania wzorców sekwencji z systemem zarządzania bazą danych', Raport Instytutu Informatyki Politechniki Poznańskiej RB-018/2000, 2000.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Clustering Sequences of Categorical Values', Foundations of Computing and Decision Sciences, Vol. 25, No. 3, 2000. (pdf, BibTeX)
Conceptual clustering is a discovery process that groups a set of data in the way that the intra-cluster similarity is maximized and the inter-cluster similarity is minimized. Traditional clustering algorithms employ some measure of distance between data points in n-dimensional space. However, not all data types can be represented in a metric space, therefore no natural distance function is available for them. We address the problem of clustering sequences of categorical values. We present a measure of similarity for the sequences and an agglomerative hierarchical algorithm that uses frequent sequential patterns found in the database to efficiently generate the resulting clusters. The algorithm iteratively merges smaller, similar clusters into bigger ones until the requested number of clusters is reached.
Marek Wojciechowski, Maciej Zakrzewicz: 'Java Server Pages: koncepcje i zastosowania', Materiały I Seminarium PLOUG 'Aplikacje internetowe na platformie Oracle9i', Warszawa, 2001. (pdf, BibTeX)
Java Server Pages (JSP), najnowsze wcielenie popularnej technologii serwletów Java, jest mechanizmem dynamicznego generowania dokumentów WWW, umożliwiającym tworzenie przenaszalnych aplikacji internetowych. JSP rozwiązuje zasadniczy problem serwletów Java: pozwala odseparować kod proceduralny (JavaBeans) od definicji szaty graficznej generowanego dokumentu WWW. Technologia ta stanowi interesującą alternatywę dla innych metod dynamicznego generowania dokumentów WWW po stronie serwera: CGI, ASP, PL/SQL Cartridge, PERL Cartridge. W artykule przedstawiono podstawowe zasady konstruowania bazodanowych aplikacji JSP i ich wykorzystania na platformie Oracle8i. Wiele omawianych rozwiązań zostało zilustrowanych przykładami.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Scalable Hierarchical Clustering Method for Sequences of Categorical Values', Proc. of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'01), Kowloon, Hong Kong, LNAI 2035, © Springer-Verlag, 2001. (pdf, BibTeX)
Data clustering methods have many applications in the area of data mining. Traditional clustering algorithms deal with quantitative or categorical data points. However, there exist many important databases that store categorical data sequences, where significant knowledge is hidden behind sequential dependencies between the data. In this paper we introduce a problem of clustering categorical data sequences and present an efficient scalable algorithm to solve the problem. Our algorithm implements the general idea of agglomerative hierarchical clustering and uses frequently occurring subsequences as features describing data sequences. The algorithm not only discovers a set of high quality clusters containing similar data sequences but also provides descriptions of the discovered clusters.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Optimizing Pattern Queries for Web Access Logs', Proc. of the 5th East European Conference on Advances in Databases and Information Systems (ADBIS'01), Vilnius, Lithuania, LNCS 2151, © Springer-Verlag, 2001. (pdf, BibTeX)
Web access logs, usually stored in relational databases, are commonly used for various data mining and data analysis tasks. The tasks typically consist in searching the web access logs for event sequences that support a given sequential pattern. For large data volumes, this type of searching is extremely time consuming and is not well optimized by traditional indexing techniques. In this paper we present a new index structure to optimize pattern search queries on web access logs. We focus on its physical structure, maintenance and performance issues.
Marek Wojciechowski: 'Interactive Constraint-Based Sequential Pattern Mining', Proc. of the 5th East European Conference on Advances in Databases and Information Systems (ADBIS'01), Vilnius, Lithuania, LNCS 2151, © Springer-Verlag, 2001. (pdf, BibTeX)
Data mining is an interactive and iterative process. It is very likely that a user will execute a series of similar queries differing in pattern constraints and mining parameters, before he or she gets satisfying results. Unfortunately, data mining algorithms currently available suffer from long processing times, which is unacceptable in case of interactive mining. In this paper we discuss efficient processing of sequential pattern queries utilizing cached results of other sequential pattern queries. We analyze differences between sequential pattern queries and propose algorithms that in many cases can be used instead of time-consuming mining algorithms.
Marek Wojciechowski, Maciej Zakrzewicz: 'Analiza technik buforowania w aplikacjach internetowych', Materiały VII konf. PLOUG, Zakopane, 2001. (pdf, BibTeX)
Buforowanie obiektów WWW jest jedną z najsilniej rozwijających się technik poprawy efektywności aplikacji internetowych. Zadaniem buforowania jest przechowywanie kopii obiektów poza miejscem ich publikowania, w celu obsługi żądań użytkowników bez konieczności odwoływania się do źródłowego serwera WWW. Algorytmy zarządzania buforem muszą rozwiązywać szereg problemów technicznych związanych m.in. z aktualizacją kopii, wielowersyjnością i ziarnistością buforowania. W artykule przedstawiono problematykę buforowania WWW oraz dokonano przeglądu architektur i metod obsługi buforów.
Marek Wojciechowski, Maciej Zakrzewicz: 'SQLJ w Oracle9i - rozszerzenia standardu', Materiały VII konf. PLOUG, Zakopane, 2001. (pdf, BibTeX)
Standard SQLJ dotyczy zagnieżdżania instrukcji SQL w programach w języku Java. SQLJ stanowi alternatywę dla bezpośredniego korzystania z interfejsu JDBC, pozwalając na pisanie bardziej zwartych i czytelnych programów. Ponadto, SQLJ zapewnia kontrolę poprawności odwołań do obiektów bazy danych i weryfikację składni instrukcji SQL na etapie kompilacji programu. Zgodnie ze standardem SQLJ instrukcje SQL zagnieżdżane w programie mają charakter statyczny, co może niekiedy stanowić istotne ograniczenie. Niniejszy artykuł poświęcony jest rozszerzeniom funkcjonalności SQLJ dostępnym w Oracle9i, ze szczególnym uwzględnieniem wsparcia dla dynamicznych instrukcji SQL.
Mariusz Glaza, Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Metody grupowania danych sekwencyjnych', Raport Instytutu Informatyki Politechniki Poznańskiej RB-004/01, 2001.
Bartłomiej Janas, Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Przyrostowe wykrywanie wzorców sekwencji - przegląd istniejących algorytmów', Raport Instytutu Informatyki Politechniki Poznańskiej RB-005/01, 2001.
Marek Wojciechowski, Maciej Zakrzewicz: 'Kosztowy optymalizator zapytań', Materiały III Seminarium PLOUG 'Efektywność i strojenie systemów baz danych Oracle', Warszawa, 2002. (pdf, BibTeX)
Praktycznie każde zapytanie SQL skierowane do systemu zarządzania bazą danych może być zrealizowane na wiele różnych sposobów. Automatycznym wyborem najbardziej efektywnej metody wykonania zapytania (planu wykonania zapytania) zajmuje się moduł optymalizatora kosztowego. Znajomość zasad funkcjonowania optymalizatora pozwala programistom na łatwiejsze strojenie budowanych aplikacji. Artykuł prezentuje wewnętrzne mechanizmy działania optymalizatora kosztowego w systemie Oracle8i/9i, ze zwróceniem szczególnej uwagi na formułę funkcji kosztu, metody dostępu do danych, metody realizacji operacji połączenia, modele statystyczne tabel i kolumn.
Juliusz Jezierski, Marek Wojciechowski, Maciej Zakrzewicz: 'Java: komponentowe aplikacje internetowe dla baz danych', Materiały I Szkoły PLOUG, Poznań, 2002.
Marek Wojciechowski, Maciej Zakrzewicz: 'Access Paths for Data Mining Query Optimizer', Proc. of the 5th International Conference on Business Information Systems (BIS 2002), Poznań, Poland, 2002. (pdf, BibTeX)
Data mining research has developed many pattern discovery algorithms dedicated to specific data and pattern characteristics. We argue that a user should not be responsible for choosing the most efficient algorithm to solve a particular data mining problem. Instead, a data mining query optimizer should follow the cost-based optimization techniques to select an appropriate algorithm to solve the user's problem. In this paper we discuss the process of data mining query optimization and we extend the list of choices the optimizer can make.
Mikołaj Morzy, Marek Wojciechowski: 'Integracja technik eksploracji danych z systemem zarządzania bazą danych na przykładzie Oracle9i Data Mining', Materiały V Seminarium PLOUG 'Projektowanie i implementowanie magazynów (hurtowni) danych', Warszawa, 2002. (pdf, BibTeX)
Eksploracja danych znajduje coraz szersze zastosowanie we wszystkich dziedzinach życia codziennego. W ostatnich latach obserwujemy odchodzenie od specjalizowanych, aplikacyjnie zorientowanych systemów eksploracyjnych na rzecz systemów ogólnego przeznaczenia. Systemy takie wymagają ścisłej integracji z istniejącą architekturą baz danych. Jedną z propozycji integracji systemu eksploracyjnego z relacyjną bazą danych jest moduł Oracle9i Data Mining. Artykuł prezentuje składniki modułu eksploracyjnego i przedstawia metodykę tworzenia aplikacji eksploracyjnych wykorzystujących to narzędzie.
Mikołaj Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Cost-Based Sequential Pattern Query Optimization in Presence of Materialized Results of Previous Queries', Intelligent Information Systems 2002, Proceedings of the IIS'2002 Symposium, Sopot, Poland, Advances in Soft Computing, Physica-Verlag, 2002. (pdf, BibTeX)
Data mining is very often regarded as an interactive and iterative process. Users interacting with the data mining system specify the class of patterns of their interest by means of data mining queries involving various types of constraints. It is very likely that a user will execute a series of similar queries, before he or she gets satisfying results. Unfortunately, data mining algorithms currently available suffer from long processing times, which is unacceptable in case of interactive mining. One possible solution, applicable in certain cases, is exploiting materialized results of previous queries when answering a new query. In this paper we discuss cost-based data mining query optimization in presence of materialized results of previous queries, focusing on one of the popular data mining techniques, called discovery of sequential patterns.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Efficient Constraint-Based Sequential Pattern Mining Using Dataset Filtering Techniques', Proc. of the Fifth International Baltic Conference on Databases and Information Systems (DB&IS 2002), Tallinn, Estonia, 2002. (Extended version, BibTeX)
Basic formulation of the sequential pattern discovery problem assumes that the only constraint to be satisfied by discovered patterns is the minimum support threshold. However, very often users want to restrict the set of patterns to be discovered by adding extra constraints on the structure of patterns. Data mining systems should be able to exploit such constraints to speed-up the mining process. In this paper we discuss efficient constraint-based sequential pattern mining using dataset filtering techniques. We show how to transform a given data mining task into an equivalent one operating on a smaller dataset. We present an extension of the GSP algorithm using dataset filtering techniques and experimentally evaluate performance gains offered by the proposed method.
Marek Wojciechowski, Maciej Zakrzewicz: 'Methods for Batch Processing of Data Mining Queries', Proc. of the Fifth International Baltic Conference on Databases and Information Systems (DB&IS 2002), Tallinn, Estonia, 2002. (pdf, BibTeX)
Data mining is a useful decision support technique, which can be used to find trends and regularities in warehouses of corporate data. A serious problem of its practical applications is long processing time required by data mining algorithms. Current systems consume minutes or hours to answer single requests, while typically batches of the requests are delivered to the systems. In this paper we present the problem of batch processing of data mining requests. We introduce methods that analyze similarities between separate requests to reduce the processing cost. We also perform a comparative performance analysis of the proposed methods.
Mikołaj Morzy, Marek Wojciechowski: 'Mechanizm perspektyw materializowanych w eksploracji danych', Systemy informatyczne - zastosowania i wdrożenia 2002, Wydawnictwa Naukowo-Techniczne, Warszawa-Szczyrk, 2002. (pdf, BibTeX)
Eksploracja danych to proces interaktywny i iteracyjny. Użytkownik definiuje zbiór interesujących go wzorców określając eksplorowany zbiór danych i wybierając konkretne wartości parametrów eksploracji. Jest bardzo prawdopodobne, że w celu uzyskania satysfakcjonujących go wyników użytkownik wielokrotnie dokona eksploracji, za każdym razem nieznacznie zmieniając eksplorowany zbiór danych lub modyfikując parametry algorytmu. Aktualnie dostępne algorytmy eksploracji danych charakteryzują się długim czasem przetwarzania, wprost proporcjonalnym do rozmiaru analizowanych danych. Ponieważ eksploracja odbywa się najczęściej w środowisku magazynu danych, długie czasy przetwarzania są nie do przyjęcia z punktu widzenia interaktywnej eksploracji. Z drugiej strony wyniki kolejnych, następujących po sobie zapytań użytkownika są bardzo zbliżone. Jednym z rozwiązań problemu długich czasów przetwarzania zapytań eksploracyjnych jest wykorzystanie zmaterializowanych wyników wcześniejszych zapytań. W tym artykule przedstawiamy koncepcję materializowanych perspektyw eksploracyjnych i sposoby wykorzystania takich perspektyw w przetwarzaniu zapytań eksploracyjnych. Pokazujemy, w jaki sposób mechanizm ten może wydatnie przyśpieszyć proces odkrywania reguł asocjacyjnych lub wzorców sekwencji. Wskazujemy też dalsze kierunki badań w tym zakresie.
Marek Wojciechowski, Maciej Zakrzewicz: 'Automatyczna personalizacja serwerów WWW z wykorzystaniem metod eksploracji danych', Systemy informatyczne - zastosowania i wdrożenia 2002, Wydawnictwa Naukowo-Techniczne, Warszawa-Szczyrk, 2002. (pdf, BibTeX)
Niniejszy artykuł poświęcony jest zagadnieniom automatycznej personalizacji serwisów WWW w oparciu o tzw. adaptatywne serwery WWW. Personalizacja serwisów WWW polega na wykorzystywaniu znanych profili preferencji do dynamicznego dostosowywania zawartości serwisu do potrzeb poszczególnych użytkowników. Adaptatywne serwery WWW automatycznie odkrywają typowe schematy zachowań użytkowników analizując informacje o użytkowaniu serwisu zawarte w logu serwera technikami eksploracji danych. W artykule przedstawiono ogólną ideę adaptatywnych serwerów WWW oraz szczegółowe propozycje ich implementacji. Artykuł poświęca również dużo miejsca technikom gromadzenia wiarygodnych informacji o użytkowaniu serwisów WWW oraz ich wstępnego przetwarzania do formatu odpowiedniego dla technik eksploracji danych.
Bogdan Czejdo, Mikołaj Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Materialized Views in Data Mining', Proc. of the 1st DEXA Workshop on Very Large Data Warehouses (VLDWH'02), Aix-en-Provence, France, IEEE Computer Society, 2002. (pdf, BibTeX)
Data mining is an interactive and iterative process. It is highly probable that a user will issue a series of similar queries until he or she receives satisfying results. Currently available mining algorithms suffer from long processing times depending mainly on the size of the dataset. As the pattern discovery takes place mainly in the data warehouse environment, such long processing times are unacceptable from the point of view of interactive data mining. On the other hand, the results of consecutive data mining queries are usually very similar. This observation leads to the idea of reusing materialized results of previous data mining queries in order to improve performance of the system. In this paper we present the concept of materialized data mining views and we show how the results stored in these views can be used to accelerate processing of data mining queries. We demonstrate the use of materialized views in the domains of association rules discovery and sequential pattern search.
Marek Wojciechowski, Maciej Zakrzewicz: 'Data Access Paths for Frequent Itemsets Discovery', Proc. of the 6th East European Conference on Advances in Databases and Information Systems (ADBIS'02), Bratislava, Slovakia, LNCS 2435, © Springer-Verlag, 2002. (pdf, BibTeX)
Many frequent itemset discovery algorithms have been proposed in the area of data mining research. The algorithms exhibit significant computational complexity, resulting in long processing times. Their performance is also dependent on source data characteristics. We argue that users should not be responsible for choosing the most efficient algorithm to solve a particular data mining problem. Instead, a data mining query optimizer should follow the cost-based optimization rules to select the appropriate method to solve the user's problem. The optimizer should consider alternative data mining algorithms as well as alternative data access paths. In this paper, we use the concept of materialized views to describe possible data access paths for frequent itemset discovery.
Marek Wojciechowski, Maciej Zakrzewicz: 'On Efficiency of Dataset Filtering Implementations in Constraint-Based Discovery of Frequent Itemsets', Proc. of the 5th Joint Conference on Knowledge-Based Software Engineering (JCKBSE'02), Maribor, Slovenia, IOS Press, 2002. (pdf, BibTeX)
Discovery of frequent itemsets is one of the fundamental data mining problems. Typically, the goal is to discover all the itemsets whose support in the source dataset exceeds a user-specified threshold. However, very often users want to restrict the set of frequent itemsets to be discovered by adding extra constraints on size and contents of the itemsets. Many constraint-based frequent itemset discovery techniques have been proposed recently. One of the techniques, called dataset filtering, is based on the observation that for some classes of constraints, itemsets satisfying them can only be supported by transactions that satisfy the same constraints. Conceptually, dataset filtering transforms a given data mining task into an equivalent one operating on a smaller dataset. In this paper we discuss possible implementations of dataset filtering, evaluating their strengths and weaknesses.
Marek Wojciechowski, Maciej Zakrzewicz: 'Dataset Filtering Techniques in Constraint-Based Frequent Pattern Mining', Proc. of the ESF Exploratory Workshop on Pattern Detection and Discovery in Data Mining, London, UK, LNAI 2447, © Springer-Verlag, 2002. (pdf, BibTeX)
Many data mining techniques consist in discovering patterns frequently occurring in the source dataset. Typically, the goal is to discover all the patterns whose frequency in the dataset exceeds a user-specified threshold. However, very often users want to restrict the set of patterns to be discovered by adding extra constraints on the structure of patterns. Data mining systems should be able to exploit such constraints to speed-up the mining process. In this paper, we focus on improving the efficiency of constraint-based frequent pattern mining by using dataset filtering techniques. Dataset filtering conceptually transforms a given data mining task into an equivalent one operating on a smaller dataset. We present transformation rules for various classes of patterns: itemsets, association rules, and sequential patterns, and discuss implementation issues regarding integration of dataset filtering with well-known pattern discovery algorithms.
Marek Wojciechowski, Maciej Zakrzewicz: 'Analiza porównawcza technologii tworzenia aplikacji internetowych dla baz danych Oracle', Materiały VIII konf. PLOUG, Zakopane, 2002. (pdf, BibTeX)
Projektanci aplikacji internetowych dla baz danych Oracle mają do dyspozycji wiele różnych technologii, począwszy od specyficznych dla platformy Oracle rozwiązań opartych o język PL/SQL, poprzez uniwersalne technologie Java, silnie wspierane przez narzędzia programistyczne i serwery aplikacji Oracle, a skończywszy na językach skryptowych takich jak PHP czy Perl, bardzo popularnych w środowisku Linux. Celem niniejszego artykułu jest przedstawienie zalet i wad wymienionych technologii oraz porównanie ich funkcjonalności i efektywności w kontekście współpracy z bazą danych Oracle.
Marek Wojciechowski, Maciej Zakrzewicz: 'TPC Benchmarking', Materiały VIII konf. PLOUG, Zakopane, 2002. (pdf, BibTeX)
Analiza porównawcza wydajności różnych systemów baz danych wymaga zbudowania modelowej, reprezentatywnej aplikacji testowej, na której efektywności zostanie zdefiniowana miara porównawcza. Opracowywanie standardowych modeli aplikacji testowych dla różnych charakterystyk obciążenia systemu jest celem istnienia założonej w 1988 roku organizacji TPC (Transaction Processing Performance Council). W artykule omówiono najważniejsze benchmarki TPC, powszechnie wykorzystywane przez dostawców systemów baz danych: TPC-C, TPC-H, TPC-R oraz TPC-W.
Piotr Dachtera, Piotr Jurga, Marek Wojciechowski, Maciej Zakrzewicz: 'Adaptatywność interfejsu użytkownika w aplikacjach internetowych', Raport Instytutu Informatyki Politechniki Poznańskiej RB-004/02, 2002.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Efficient Constraint-Based Sequential Pattern Mining Using Dataset Filtering Techniques' (extended version), H-M. Haav, A. Kalja (Eds.), Databases and Information Systems II, Selected Papers from the Fifth International Baltic Conference, BalticDB&IS'2002, Kluwer Academic Publishers, 2002. (pdf, BibTeX)
Basic formulation of the sequential pattern discovery problem assumes that the only constraint to be satisfied by discovered patterns is the minimum support threshold. However, very often users want to restrict the set of patterns to be discovered by adding extra constraints on the structure of patterns. Data mining systems should be able to exploit such constraints to speed-up the mining process. In this paper we discuss efficient constraint-based sequential pattern mining using dataset filtering techniques. We show how to transform a given data mining task into an equivalent one operating on a smaller dataset. We present an extension of the GSP algorithm using dataset filtering techniques and experimentally evaluate performance gains offered by the proposed method.
Krzysztof Jankiewicz, Tomasz Traczyk, Marek Wojciechowski, Maciej Zakrzewicz: 'Przechowywanie i przetwarzanie danych XML w systemach baz danych', Materiały II Szkoły PLOUG, Poznań, 2003.
Marek Wojciechowski, Maciej Zakrzewicz: 'Obsługa transakcji rozproszonych w języku Java', Materiały VII Seminarium PLOUG 'Budowa systemów rozproszonych w technologii Oracle', Warszawa, 2003. (pdf, BibTeX)
Niniejszy artykuł poświęcony jest zagadnieniom dotyczącym przetwarzania transakcyjnego w aplikacjach języka Java, ze szczególnym naciskiem na obsługę transakcji rozproszonych. W artykule przedstawiono możliwości oferowane w tym zakresie przez standardy JDBC i JTA, oparte o ogólny standard XA. Artykuł opisuje typową architekturę przetwarzania transakcyjnego dla platformy J2EE oraz sposoby realizacji transakcji w standardzie JTA przez aplikacje Java warstwy pośredniej.
Maciej Zakrzewicz, Marek Wojciechowski: 'Budowa komponentów Enterprise JavaBeans', Materiały VII Seminarium PLOUG 'Budowa systemów rozproszonych w technologii Oracle', Warszawa, 2003. (pdf, BibTeX)
Technologia Enterprise JavaBeans (EJB) pozwala programistom Java na budowę rozproszonych komponentów aplikacyjnych, najczęściej stosowanych do implementacji tzw. logiki biznesowej. W artykule omówiono architekturę i techniki konstrukcji komponentów EJB oraz metody komunikacji z nimi z poziomu programów tworzonych w języku Java.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz, Piotr Dachtera, Piotr Jurga: 'Implementing Adaptive User Interface for Web Applications', Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM'03 Conference, Zakopane, Poland, Advances in Soft Computing, Springer-Verlag, 2003. (pdf, BibTeX)
Adaptive web sites automatically improve their organization and presentation to satisfy needs of individual web users. The paper describes our experiences gained during designing and implementing an adaptive extension to a web server - AdAgent. AdAgent is based on the adaptation model, where lists of recommended links are dynamically generated for each browsing user, and embedded in web pages. AdAgent consists of two components: the off-line module using web access logs to discover knowledge about users' behavior, and the on-line module extending the web server functionality, responsible for dynamic personalization of web pages.
Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz, Piotr Dachtera, Piotr Jurga: 'AdAgent: Template-based Approach to Adaptive Web Sites', Proc. of the First Symposium on Databases, Data Warehousing and Knowledge Discovery, Baden Baden, Germany, Scientific Publishers OWN, 2003. (pdf, BibTeX)
Adaptive web sites automatically improve their organization and presentation to satisfy needs of individual web users. Typically, such improvement is achieved by automatic link generation. AdAgent is our prototype adaptive extension to a web server, which provides web users with automatically discovered recommended links. In this paper we focus on our template-based method for creating adaptive web pages by using extended HTML tags.
Marek Wojciechowski, Maciej Zakrzewicz: 'Evaluation of Common Counting Method for Concurrent Data Mining Queries', Proc. of the 7th East European Conference on Advances in Databases and Information Systems (ADBIS'03), Dresden, Germany, LNCS 2798, © Springer-Verlag, 2003. (pdf, BibTeX)
Data mining queries are often submitted concurrently to the data mining system. The data mining system should take advantage of overlapping of the mined datasets. In this paper we focus on frequent itemset mining and we discuss and experimentally evaluate the implementation of the Common Counting method on top of the Apriori algorithm. The general idea of Common Counting is to reduce the number of times the common parts of the source datasets are scanned during the processing of the set of frequent pattern queries.
Marek Wojciechowski, Łukasz Matuszczak: 'Oracle interMedia na tle standardu SQL/MM i prototypowych systemów multimedialnych baz danych', Materiały IX konf. PLOUG, Zakopane, 2003. (pdf, BibTeX)
Przetwarzanie danych multimedialnych od lat było przedmiotem wielu projektów badawczych i tematem prac naukowych. Obecnie jesteśmy świadkami kształtowania się standardu SQL/MM, który w dużej części poświęcony jest obsłudze danych multimedialnych z poziomu języka SQL. Obsługą obrazów, dźwięku, sekwencji wideo i heterogenicznych danych multimedialnych w ramach systemu zarządzania bazą danych Oracle9i zajmuje się Oracle interMedia. Celem niniejszego artykułu jest ocena możliwości oferowanych przez Oracle interMedia w świetle specyfikacji SQL/MM i funkcjonalności prototypów badawczych.
Łukasz Matuszczak, Marek Wojciechowski: 'Wyszukiwanie obrazów w bazach danych', Czasopismo Stowarzyszenia Polskiej Grupy Użytkowników Systemu Oracle ORACLE'owe PLOUG'tki 28, 2003. (pdf, BibTeX)
Marek Wojciechowski: 'Supporting Interactive Sequential Pattern Discovery in Databases', Proc. of the Emerging Database Research in East Europe, VLDB 2003 Workshop, Berlin, Germany, 2003. (pdf, BibTeX)
One of the most important data mining problems is discovery of sequential patterns. Sequential pattern mining consists in discovering all frequently occurring subsequences in a collection of data sequences. This paper discusses several issues concerning possible extensions to traditional database management systems required to support sequential pattern discovery: a sequential pattern query language for specifying mining tasks and storing discovered patterns in the database, techniques of integrating various pattern constraints that can be specified in mining queries into the mining process in order to improve performance, and a framework for exploiting cached results of previous queries to support iterative and interactive data mining. The paper summarizes the author's recent research on the above topics.
Marek Wojciechowski, Maciej Zakrzewicz: 'Algorytmy współbieżnego przetwarzania zapytań eksploracyjnych', Raport Instytutu Informatyki Politechniki Poznańskiej RB-036/03, 2003.
Mikolaj Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Projektowanie aplikacji dla platformy J2EE', Materiały III Szkoły PLOUG, Poznań, 2004.
Yannis Manolopoulos, Mikolaj Morzy, Tadeusz Morzy, Alexandros Nanopoulos, Marek Wojciechowski, Maciej Zakrzewicz: 'Indexing Techniques for Web Access Logs', Web Information Systems, Idea Group Publishing, 2004. (Info, BibTeX)
Access histories of users visiting a web server are automatically recorded in web access logs. Conceptually, the web-log data can be regarded as a collection of clients' access-sequences where each sequence is a list of pages accessed by a single user in a single session. This chapter presents novel indexing techniques that support efficient processing of so-called pattern queries, which consist in finding all access sequences that contain a given subsequence. Pattern queries are a key element of advanced analyses of web-log data, especially those concerning typical navigation schemes. In this chapter, we discuss the particularities of efficiently processing user access-sequences with pattern queries, compared to the case of searching unordered sets. Extensive experimental results are given, which examine a variety of factors and illustrate the superiority of the proposed methods over indexing techniques for unordered data adapted to access sequences.
Krzysztof Jankiewicz, Marek Wojciechowski: 'Standard SQL/MM: SQL Multimedia and Application Packages', Materiały IX Seminarium PLOUG 'Przetwarzanie zaawansowanych struktur danych: Oracle interMedia, Spatial, Text i XML DB', Warszawa, 2004. (pdf, BibTeX)
SQL/MM jest nowym standardem uzupełniającym język SQL o obsługę zaawansowanych typów danych. Celem referatu jest przedstawienie głównych idei standardu SQL/MM, a w szczególności jego części poświęconych przetwarzaniu danych tekstowych, przestrzennych i obrazów w bazach danych. Przeanalizowana zostanie również zgodność rozwiązań oferowanych w tym zakresie przez Oracle10g ze standardem SQL/MM.
Marek Wojciechowski, Maciej Zakrzewicz: 'Data Mining Query Scheduling for Apriori Common Counting', Proc. of the Sixth International Baltic Conference on Databases and Information Systems (DB&IS 2004), Riga, Latvia, 2004. (pdf, BibTeX)
Also published in a book: Databases and Information Systems, Selected Papers from the Sixth International Baltic Conference DB&IS'2004, IOS Press, 2005. (BibTeX)
In this paper we consider concurrent execution of multiple data mining queries. If such data mining queries operate on similar parts of the database, then their overall I/O cost can be reduced by integrating their data retrieval operations. The integration requires that many data mining queries are present in memory at the same time. If the memory size is not sufficient to hold all the data mining queries, then the queries must be scheduled into multiple phases of loading and processing. We discuss the problem of data mining query scheduling and propose a heuristic algorithm to efficiently schedule the data mining queries into phases.
Marek Wojciechowski, Maciej Zakrzewicz: 'Evaluation of the Mine Merge Method for Data Mining Query Processing', Proc. of the 8th East European Conference on Advances in Databases and Information Systems (ADBIS'04), Budapest, Hungary, 2004. (pdf, BibTeX)
In this paper we consider concurrent execution of multiple data mining queries in the context of discovery of frequent itemsets. If such data mining queries operate on similar parts of the database, then their overall I/O cost can be reduced by transforming the set of data mining queries into another set of non-overlapping queries, whose results can be used to efficiently answer the original queries. We discuss the problem of multiple data mining query optimization and experimentally evaluate the Mine Merge algorithm to efficiently execute sets of data mining queries.
Mikolaj Morzy, Maciej Zakrzewicz, Marek Wojciechowski: 'A Study on Answering a Data Mining Query Using a Materialized View', Proc. of the 19th International Symposium on Computer and Information Sciences (ISCIS'04), Kemer – Antalya, Turkey, LNCS 3280, © Springer-Verlag, 2004. (pdf, BibTeX)
One of the classic data mining problems is discovery of frequent itemsets. This problem particularly attracts database community as it resembles traditional database querying. In this paper we consider a data mining system which supports storing of previous query results in the form of materialized data mining views. While numerous works have shown that reusing results of previous frequent itemset queries can significantly improve performance of data mining query processing, a thorough study of possible differences between the current query and a materialized view has not been presented yet. In this paper we classify possible differences into six classes, provide I/O cost analysis for all the classes, and experimentally evaluate the most promising ones.
Yannis Manolopoulos, Mikolaj Morzy, Tadeusz Morzy, Alexandros Nanopoulos, Marek Wojciechowski, Maciej Zakrzewicz: 'Signature-Based Indexing Techniques for Web Access Logs', Encyclopedia of Information Science and Technology, 2005. (Info, BibTeX)
Krzysztof Jankiewicz, Mikolaj Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Hurtownie danych: od koncepcji do wdrożenia', Materiały IV Szkoły PLOUG, Poznań, 2005.
Marek Wojciechowski, Maciej Zakrzewicz: 'On Multiple Query Optimization in Data Mining', Proc. of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'05), Hanoi, Vietnam, LNAI 3518, © Springer-Verlag, 2005. (pdf, BibTeX)
Traditional multiple query optimization methods focus on identifying common subexpressions in sets of relational queries and on constructing their global execution plans. In this paper we consider the problem of optimizing sets of data mining queries submitted to a Knowledge Discovery Management System. We describe the problem of data mining query scheduling and we introduce a new algorithm called CCAgglomerative to schedule data mining queries for frequent itemset discovery.
Mikolaj Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Intelligent Reputation Assessment for Participants of Web-based Customer-to-Customer Auctions', Proc. of the 3rd Atlantic Web Intelligence Conference (AWIC'05), Lodz, Poland, LNAI 3528, © Springer-Verlag, 2005. (pdf, BibTeX)
The Internet witnesses the unprecedent boom of customer-to-customer e-commerce. Most online auction providers use simple participation counts for reputation rating, thus enabling dishonest participants to cheat. In this paper we propose a novel definition of reputation and credibility of C2C e-commerce participants and we present an algorithm for reputation rating estimation. We conduct several experiments on real-world data which prove the feasibility of our algorithm.
Marek Wojciechowski, Maciej Zakrzewicz: 'Efficient Processing of Frequent Itemset Queries Using a Collection of Materialized Views', New Trends in Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM'05 Conference, Gdansk, Poland, Advances in Soft Computing, Springer-Verlag, 2005. (pdf, BibTeX)
One of the classic data mining problems is discovery of frequent itemsets. Frequent itemset discovery tasks can be regarded as advanced database queries specifying the source dataset, the minimum support threshold, and optional constraints on itemsets. We consider a data mining system which supports storing of results of previous queries in the form of materialized data mining views. Previous work on materialized data mining views addressed the issue of reusing results of one of the previous frequent itemset queries to efficiently answer the new query. In this paper we present a new approach to frequent itemset query processing in which a collection of materialized views can be used for that purpose.
Marek Wojciechowski, Maciej Zakrzewicz: 'Heuristic Scheduling of Concurrent Data Mining Queries', Proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA 2005), Wuhan, China, LNAI 3584, © Springer-Verlag, 2005. (pdf, BibTeX)
Execution cost of batched data mining queries can be reduced by integrating their I/O steps. Due to memory limitations, not all data mining queries in a batch can be executed together. In this paper we introduce a heuristic algorithm called CCFull, which suboptimally schedules the data mining queries into a number of execution phases. The algorithm significantly outperforms the optimal approach while providing a very good accuracy.
Mikolaj Morzy, Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Incremental Data Mining Using Concurrent Online Refresh of Materialized Data Mining Views', Proc. of the 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2005), Copenhagen, Denmark, LNCS 3589, © Springer-Verlag, 2005. (pdf, BibTeX)
Data mining is an iterative process. Users issue series of similar data mining queries, in each consecutive run slightly modifying either the definition of the mined dataset, or the parameters of the mining algorithm. This model of processing is most suitable for incremental mining algorithms that reuse the results of previous queries when answering a given query. Incremental mining algorithms require the results of previous queries to be available. One way to preserve those results is to use materialized data mining views. Materialized data mining views store the mined patterns and refresh them as the underlying data change. Data mining and knowledge discovery often take place in a data warehouse environment. There can be many relatively small materialized data mining views defined over the data warehouse. Separate refresh of each materialized view can be expensive, if the refresh process has to re-discover patterns in the original database. In this paper we present a novel approach to materialized data mining view refresh process. We show that the concurrent on-line refresh of a set of materialized data mining views is more efficient than the sequential refresh of individual views. We present the framework for the integration of data warehouse refresh process with the maintenance of materialized data mining views. Finally, we prove the feasibility of our approach by conducting several experiments on synthetic data sets.
Mikolaj Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Optimizing a Sequence of Frequent Pattern Queries', Proc. of the 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2005), Copenhagen, Denmark, LNCS 3589, © Springer-Verlag, 2005. (pdf, BibTeX)
Discovery of frequent patterns is a very important data mining problem with numerous applications. Frequent pattern mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. A significant amount of research on efficient processing of frequent pattern queries has been done in recent years, focusing mainly on constraint handling and reusing results of previous queries. In this paper we tackle the problem of optimizing a sequence of frequent pattern queries, submitted to the system as a batch. Our solutions are based on previously proposed techniques of reusing results of previous queries, and exploit the fact that knowing a sequence of queries a priori gives the system a chance to schedule and/or adjust the queries so that they can use results of queries executed earlier. We begin with simple query scheduling and then consider other transformations of the original batch of queries.
Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek: 'Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm', Proc. of the 1st ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD'05), Tallinn, Estonia, 2005. (pdf, BibTeX)
Discovery of frequent itemsets is a very important data mining problem with numerous applications. Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. A significant amount of research on frequent itemset mining has been done so far, focusing mainly on developing faster complete mining algorithms, efficient constraint handling, and reusing results of previous queries. Recently, a new problem of optimizing processing of batches of frequent itemset queries has been considered and two multiple query optimization techniques for frequent itemset queries: Common Counting and Mine Merge have been proposed. Mine Merge does not depend on a particular mining algorithm, while Common Counting has been specifically designed to work with Apriori. Nevertheless, in previous works the efficiency of Mine Merge was tested only on Apriori, and it is unclear how it would perform with newer pattern-growth algorithms like FP-growth. In this paper we adapt the Common Counting method to work with FP-growth and evaluate efficiency of both methods when FP-growth is used as a basic mining algorithm.
Stanislaw Prinke, Marek Wojciechowski, Maciej Zakrzewicz: 'Pruning Discovered Sequential Patterns Using Minimum Improvement Threshold', Proc. of the 1st ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD'05), Tallinn, Estonia, 2005. (Extended version, pdf, BibTeX)
Discovery of sequential patterns is an important data mining problem with numerous applications. Sequential patterns are subsequences frequently occurring in a database of sequences of sets of items. In a basic scenario, the goal of sequential pattern mining is discovery of all patterns whose frequency exceeds a user-specified frequency threshold. The problem with such an approach is a huge number of sequential patterns which are likely to be returned for reasonable frequency thresholds. One possible solution to this problem is excluding the patterns which do not provide significantly more information than some other patterns in the result set. Two approaches falling into that category have been studied in the context of sequential patterns: discovery of maximal patterns and closed patterns. Unfortunately, the set of maximal patterns may not contain many important patterns with high frequency, and discovery of closed patterns may not reduce the number of resulting patterns for sparse datasets. Therefore, in this paper we propose and experimentally evaluate the minimum improvement criterion to be used in the post-processing phase to reduce the number of sequential patterns returned to the user. Our method is an adaptation of one of the methods previously proposed for association rules.
Marek Wojciechowski, Maciej Zakrzewicz: 'Partycjonowanie grafów a optymalizacja wykonania zbioru zapytań eksploracyjnych', Materiały I Krajowej Konferencji Naukowej Technologie Przetwarzania Danych (TPD 2005), Poznań, 2005. (pdf, BibTeX)
Optymalizacja wykonania zbioru zapytań eksploracyjnych polega na takim podziale zbioru zapytań na rozłączne podzbiory zapytań wykonywanych współbieżnie, aby sumaryczny koszt I/O ich realizacji był minimalny. W artykule proponujemy transformację problemu optymalizacji wykonania zbioru zapytań eksploracyjnych odkrywania zbiorów częstych do problemu partycjonowania grafu, intensywnie badanego w wielu dziedzinach nauki. Redefiniujemy klasyczny problem partycjonowania grafu oraz pokazujemy, w jaki sposób istniejące algorytmy heurystyczne mogą być zaadaptowane w celu rozwiązania problemu optymalizacji wykonania zbioru zapytań eksploracyjnych.
Dariusz Aleksandrowicz, Marek Wojciechowski: 'Technologie szablonów dla serwletów Java', Materiały XI konf. PLOUG, Zakopane, 2005. (pdf, BibTeX)
Technologie szablonów dla serwletów Java stanowią alternatywę dla JSP w zakresie budowy interfejsu użytkownika w aplikacjach internetowych na platformie J2EE. Podobnie jak w przypadku JSP, motywacją dla ich powstania było umożliwienie separacji statycznych fragmentów dokumentów HTML i kodu Java dynamicznie generującego fragmenty zmienne. Celem niniejszego artykułu jest dokonanie przeglądu istniejących technologii szablonów oraz ocena ich możliwości i pozycji na rynku. Omówione i zilustrowane przykładem prostej aplikacji zostaną najpopularniejsze obecnie technologie szablonów: Velocity, FreeMarker i WebMacro.
Mikolaj Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Technologie szkieletowe dla aplikacji J2EE', Materiały V Szkoły PLOUG, Poznań, 2006.
Stanislaw Prinke, Marek Wojciechowski, Maciej Zakrzewicz: 'Pruning Discovered Sequential Patterns Using Minimum Improvement Threshold' (extended version), Foundations of Computing and Decision Sciences, Vol. 31, No. 1, Special issue: Data Mining and Knowledge Discovery, 2006. (pdf, BibTeX)
Discovery of sequential patterns is an important data mining problem with numerous applications. Sequential patterns are subsequences frequently occurring in a database of sequences of sets of items. In a basic scenario, the goal of sequential pattern mining is discovery of all patterns whose frequency exceeds a user-specified frequency threshold. The problem with such an approach is a huge number of sequential patterns which are likely to be returned for reasonable frequency thresholds. One possible solution to this problem is excluding the patterns which do not provide significantly more information than some other patterns in the result set. Two approaches falling into that category have been studied in the context of sequential patterns: discovery of maximal patterns and closed patterns. Unfortunately, the set of maximal patterns may not contain many important patterns with high frequency, and discovery of closed patterns may not reduce the number of resulting patterns for sparse datasets. Therefore, in this paper we propose and experimentally evaluate the minimum improvement criterion to be used in the post-processing phase to reduce the number of sequential patterns returned to the user. Our method is an adaptation of one of the methods previously proposed for association rules.
Marek Wojciechowski: 'AJAX - rewolucja w tworzeniu aplikacji internetowych', Materiały XII Seminarium PLOUG 'Nowe technologie XML', Warszawa, 2006. (pdf, BibTeX)
Asynchronous JavaScript And XML (AJAX) to nowa technika tworzenia aplikacji internetowych, pozwalająca na uzyskanie niespotykanego dotychczas w tego typu aplikacjach poziomu interaktywności interfejsu użytkownika. AJAX nie stanowi nowej samodzielnej technologii, a jedynie systematyzuje sposób tworzenia interaktywnych aplikacji internetowych w oparciu o języki JavaScript i CSS, obiektowy model dokumentów DOM i obiekt XMLHttpRequest. Celem artykułu jest przedstawienie motywacji dla techniki AJAX, jej założeń, zalet i wad oraz omówienie sposobu tworzenia aplikacji AJAX i ról poszczególnych technologii składowych.
Przemyslaw Grudzinski, Marek Wojciechowski, Maciej Zakrzewicz: 'Partition-Based Approach to Processing Batches of Frequent Itemset Queries', Proceedings of the 7th International Conference on Flexible Query Answering Systems (FQAS 2006), Milan, Italy, LNAI 4027, © Springer-Verlag, 2006. (pdf, BibTeX)
We consider the problem of optimizing processing of batches of frequent itemset queries. The problem is a particular case of multiple-query optimization, where the goal is to minimize the total execution time of the set of queries. We propose an algorithm that is a combination of the Mine Merge method, previously proposed for processing of batches of frequent itemset queries, and the Partition algorithm for memory-based frequent itemset mining. The experiments show that the novel approach outperforms the original Mine Merge and sequential processing in majority of cases.
Pawel Boinski, Konrad Jozwiak, Marek Wojciechowski, Maciej Zakrzewicz: 'Improving Quality of Agglomerative Scheduling in Concurrent Processing of Frequent Itemset Queries', New Trends in Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM'06 Conference, Ustron, Poland, Advances in Soft Computing, Springer, 2006. (pdf, BibTeX)
Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. Recently, a new problem of optimizing processing of batches of frequent itemset queries has been considered. The best technique for this problem proposed so far is Common Counting, which consists in concurrent processing of frequent itemset queries and integrating their database scans. Common Counting requires that data structures of several queries are stored in main memory at the same time. Since in practice memory is limited, the crucial problem is scheduling the queries to Common Counting phases so that the I/O cost is optimized. According to our previous studies, the best algorithm for this task, applicable to large batches of queries, is CCAgglomerative. In this paper we present a novel query scheduling method CCAgglomerativeNoise, built around CCAgglomerative, increasing its chances of finding an optimal solution.
Pawel Boinski, Konrad Jozwiak, Marek Wojciechowski, Maciej Zakrzewicz: 'Estimating Hash-Tree Sizes in Concurrent Processing of Frequent Itemset Queries', International Journal of Information Technology and Intelligent Computing, Vol. 1, No. 2, 2006. Presented at 8th International Conference on Artificial Intelligence and Soft Computing (ICAISC 2006), Zakopane, Poland, 2006. (pdf, BibTeX)
We consider the problem of optimizing the processing of batches of frequent itemset queries. One of the methods proposed for this task is Apriori Common Counting, which consists in concurrent processing of frequent itemset queries and integrating their database scans. Apriori Common Counting requires that hash-trees of several queries are stored in main memory at the same time. Since in practice memory is limited, the crucial problem is scheduling the queries to execution phases so that the I/O cost is optimized. As the scheduling algorithm has to know the hash-tree sizes of the queries, previous approaches generated all the hash-trees before scheduling and swapped them to disk, which introduced extra I/O cost. In this paper we present a method of calculating an upper bound on the size of a hash tree, and propose to schedule the queries using estimates instead of actual hash-tree sizes.
Pawel Boinski, Marek Wojciechowski, Maciej Zakrzewicz: 'A Greedy Approach to Concurrent Processing of Frequent Itemset Queries', Proc. of the 8th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2006), Krakow, Poland, LNCS 4081, © Springer-Verlag, 2006. (pdf, BibTeX)
We consider the problem of concurrent execution of multiple frequent itemset queries. If such data mining queries operate on overlapping parts of the database, then their overall I/O cost can be reduced by integrating their dataset scans. The integration requires that data structures of many data mining queries are present in memory at the same time. If the memory size is not sufficient to hold all the data mining queries, then the queries must be scheduled into multiple phases of loading and processing. Since finding the optimal assignment of queries to phases is infeasible for large batches of queries due to the size of the search space, heuristic algorithms have to be applied. In this paper we formulate the problem of assigning the queries to phases as a particular case of hypergraph partitioning. To solve the problem, we propose and experimentally evaluate two greedy optimization algorithms.
Marek Wojciechowski: 'Co nowego w Java EE?', Materiały XII konf. PLOUG, Zakopane, 2006. (pdf, BibTeX)
Najnowsza wersja specyfikacji Java Platform Enterprise Edition wprowadza szereg istotnych zmian wpływających na sposób tworzenia aplikacji w tej technologii. W artykule zostaną przedstawione i zilustrowane przykładami najistotniejsze zmiany i nowe elementy specyfikacji, ze szczególnym uwzględnieniem EJB 3.0 i nowego standardu Java Persistence.
Marek Wojciechowski: 'Tworzenie aplikacji dla Oracle Application Server 10g R3 w technologii EJB 3.0', Materiały XV Seminarium PLOUG 'Oracle Application Server 10g R3: aplikacje, wydajność, bezpieczeństwo, niezawodność', Warszawa, 2007. (pdf, BibTeX)
Jedną z nowych cech serwera Oracle Application Server 10g R3 jest wsparcie dla standardu EJB 3.0. Komponenty Enterprise JavaBeans (EJB) od początku istnienia platformy Java Enterprise Edition były lansowane jako podstawowa technologia implementacji logiki biznesowej. Niestety wcześniejsze wersje tej technologii cechowały się nadmierną złożonością, a także mogły prowadzić do nieefektywnych rozwiązań, szczególnie w przypadku wykorzystania jej do komunikacji z bazą danych. W wersji 3.0 technologia EJB została znacząco uproszczona, a komunikacja z bazą danych została wyodrębniona jako odrębny standard o nazwie Java Persistence API. Celem niniejszego artykułu jest omówienie technologii EJB 3.0 i tworzenia w niej aplikacji dla serwera Oracle Application Server 10g R3 w środowisku Oracle JDeveloper 10g.
Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek: 'Three Strategies for Concurrent Processing of Frequent Itemset Queries Using FP-growth' Knowledge Discovery in Inductive Databases, 5th International Workshop, KDID 2006, Berlin, Germany, September 18, 2006, Revised Selected and Invited Papers, LNCS 4747, © Springer, 2007. (pdf, BibTeX)
Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. Recently, a new problem of optimizing processing of sets of frequent itemset queries has been considered and two multiple query optimization techniques for frequent itemset queries: Mine Merge and Common Counting have been proposed and tested on the Apriori algorithm. In this paper we discuss and experimentally evaluate three strategies for concurrent processing of frequent itemset queries using FP-growth as a basic frequent itemset mining algorithm. The first strategy is Mine Merge, which does not depend on a particular mining algorithm and can be applied to FP-growth without modifications. The second is an implementation of the general idea of Common Counting for FP-growth. The last is a completely new strategy, motivated by identified shortcomings of the previous two strategies in the context of FP-growth.
Przemyslaw Grudzinski, Marek Wojciechowski: 'Integracja drzew kandydatów w przetwarzaniu zbiorów zapytań eksploracyjnych algorytmem Apriori', Materiały II Krajowej Konferencji Naukowej Technologie Przetwarzania Danych (TPD 2007), Poznań, 2007. (pdf, BibTeX)
Artykuł poświęcony jest problematyce przetwarzania zbiorów zapytań eksploracyjnych dla problemu odkrywania zbiorów częstych algorytmem Apriori. Z dotychczasowych rozwiązań w tym zakresie najlepszą metodą jest Common Counting, czyli współbieżne wykonanie zapytań algorytmem Apriori z integracją odczytów fragmentów bazy danych współdzielonych przez zapytania. W tej pracy proponujemy nową metodę przetwarzania zbiorów zapytań eksploracyjnych dla algorytmu Apriori - Common Candidate Tree, opartą o ściślejszą integrację przetwarzanych zapytań poprzez współdzielenie struktur danych w pamięci operacyjnej. Wyniki eksperymentów pokazują, że Common Candidate Tree jest metodą bardziej wydajną niż Common Counting, a dodatkowo, ze względu na oszczędniejsze gospodarowanie pamięcią operacyjną, może być stosowana dla liczniejszych zbiorów zapytań.
Przemyslaw Grudzinski, Marek Wojciechowski: 'Integration of Candidate Hash Trees in Concurrent Processing of Frequent Itemset Queries Using Apriori', Proc. of the 3rd ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD'07), Varna, Bulgaria, 2007. (Polish version, pdf, BibTeX)
In this paper we address the problem of processing of batches of frequent itemset queries using the Apriori algorithm. The best solution of this problem proposed so far is Common Counting, which consists in concurrent execution of the queries using Apriori with the integration of scans of the parts of the database shared among the queries. In this paper we propose a new method - Common Candidate Tree, offering a more tight integration of the concurrently processed queries by sharing memory data structures, i.e., candidate hash trees. The experiments show that Common Candidate Tree outperforms Common Counting in terms of execution time. Moreover, thanks to smaller memory consumption, Common Candidate Tree can be applied to larger batches of queries.
Piotr Błoch, Marek Wojciechowski: 'Analiza porównawcza technologii odwzorowania obiektowo-relacyjnego dla aplikacji Java', Materiały XIII konf. PLOUG, Zakopane, 2007. (pdf, BibTeX)
Zaawansowane technologie do obsługi komunikacji z bazą danych w aplikacjach Java są oparte o koncepcję odwzorowania obiektowo-relacyjnego. Idea ta stanowi również podstawę nowego standardu Java Persistence API, opracowanego wraz z EJB 3.0. Celem artykułu jest porównanie możliwości i wydajności najpopularniejszych technologii odwzorowania obiektowo-relacyjnego: Hibernate i Oracle Toplink oraz standardów JDO i Java Persistence API.
Juliusz Jezierski, Mariusz Masewicz, Mikolaj Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Oracle Database 11g: podstawy administracji', Materiały VI Szkoły PLOUG, Poznań, 2008.
Marek Wojciechowski: 'Od interMedia do Multimedia - obsługa danych multimedialnych w Oracle 10g/11g', Materiały XIV konf. PLOUG, Szczyrk, 2008. (pdf, BibTeX)
Oracle Multimedia to cecha (ang. feature) serwera bazy danych Oracle 11g dostępna zarówno w edycji serwera Enterprise Edition jak i Standard Edition, w poprzednich wersjach serwera znana pod nazwą Oracle interMedia. Oracle Multimedia umożliwia składowanie, wyszukiwanie i w pewnym stopniu przetwarzanie obrazów, danych audio i danych wideo. Celem artykułu jest przedstawienie nowości i zmian w Oracle interMedia 10.2 i Oracle Multimedia 11.1, ze szczególnym uwzględnieniem ekstrakcji metadanych o obrazach i obsługi obrazów medycznych.
Wojciech Buras, Marek Wojciechowski: 'Examination of the applicability of Service Oriented Architecture (SOA) to small business applications', Technical Report RA-17/08, Poznan University of Technology, 2008.
Przemyslaw Grudzinski, Marek Wojciechowski: 'Integration of Candidate Hash Trees in Concurrent Processing of Frequent Itemset Queries Using Apriori', Control and Cybernetics, Vol. 38, No. 1, 2009. (pdf, BibTeX)
Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. In this paper we address the problem of processing batches of frequent itemset queries using the Apriori algorithm. The best solution of this problem proposed so far is Common Counting, which consists in concurrent execution of the queries using Apriori with the integration of scans of the parts of the database shared among the queries. In this paper we propose a new method - Common Candidate Tree, offering a more tight integration of the concurrently processed queries by sharing memory data structures, i.e., candidate hash trees. The experiments show that Common Candidate Tree outperforms Common Counting in terms of execution time. Moreover, thanks to smaller memory consumption, Common Candidate Tree can be applied to larger batches of queries.
Mariusz Masewicz, Mikołaj Morzy, Marek Wojciechowski, Maciej Zakrzewicz: 'Oracle ADF - wzorce konwersji dla Oracle Forms', Materiały X Szkoły PLOUG, Poznań, 2009.
Bartosz Mordaka, Marek Wojciechowski: 'Oracle ADF i JBoss Seam - dwa skrajnie różne podejścia do współpracy JSF z EJB', Materiały XV konf. PLOUG, Kościelisko, 2009. (pdf, BibTeX)
JavaServer Faces (JSF) i Enterprise JavaBeans (EJB) to dwie kluczowe technologie platformy Java EE. Pierwsza z nich jest obecnie podstawową technologią do implementacji warstwy prezentacji w aplikacjach Java EE, a druga od lat ma status "oficjalnej" technologii dla warstwy logiki biznesowej. Niniejszy artykuł poświęcony jest problematyce współpracy JSF i EJB w aplikacjach Java EE. Jak dotąd ta współpraca ciągle nie doczekała się standaryzacji, dlatego programiści zdani są na własne rozwiązania lub te oferowane przez istniejące szkielety aplikacji. Celem artykułu jest przedstawienie i porównanie podejść do integracji JSF i EJB w ramach szkieletów aplikacji Oracle ADF i JBoss Seam.
Piotr Jędrzejczak, Marek Wojciechowski: 'Ścieżki dostępu do danych w odkrywaniu zbiorów częstych', Raport Instytutu Informatyki Politechniki Poznańskiej RB-02/09, 2009.
Damian Mierzwiński, Marek Wojciechowski: 'Biblioteka do obsługi przepisywania zapytań eksploracyjnych dla systemu Oracle 11g', Raport Instytutu Informatyki Politechniki Poznańskiej RB-03/09, 2009.
Paweł Boiński, Konrad Jóźwiak, Marek Wojciechowski, Maciej Zakrzewicz: 'Query Partitioning Algorithms for Concurrent Processing of Large Batches of Frequent Itemset Queries', Technical Report RA-03/09, Poznan University of Technology, 2009.
Krzysztof Jankiewicz, Mariusz Masewicz, Marek Wojciechowski, Maciej Zakrzewicz: 'Oracle Application Express (APEX): tworzenie aplikacji WWW', Materiały XI Szkoły PLOUG, Poznań, 2010.
Piotr Jędrzejczak, Marek Wojciechowski: 'Ścieżki dostępu do danych dla algorytmów przetwarzania zbiorów zapytań eksploracyjnych', Materiały III Krajowej Konferencji Naukowej Technologie Przetwarzania Danych (TPD 2010), Poznań, 2010. (pdf, BibTeX)
Odkrywanie zbiorów częstych można traktować jako realizację zaawansowanych zapytań do bazy danych, gdzie użytkownik specyfikuje ograniczenia dotyczące selekcji danych źródłowych oraz odkrywanych wzorców. Ponieważ takie zapytania eksploracyjne mogą być wysyłane do bazy danych w trybie wsadowym, np. w czasie niewielkiego obciążenia serwera bazy danych, zaproponowano szereg algorytmów przetwarzania zbiorów zapytań eksploracyjnych. Metody te wykorzystują współdzielenie danych źródłowych między zapytaniami i operują na logicznych partycjach danych wyznaczonych przez nakładanie się źródłowych zbiorów danych. Wszystkie zaproponowane dotychczas algorytmy były testowane jedynie w warunkach dostępności bezpośrednich ścieżek dostępu do poszczególnych partycji, zazwyczaj na plikach płaskich. W praktyce dane poddawane eksploracji są najczęściej składowane w bazach danych, gdzie po pierwsze potencjalnie dostępnych jest wiele różnych ścieżek dostępu do danych, a po drugie dla konkretnych warunków selekcji danych na ogół dostępne są tylko niektóre z nich. Celem niniejszego artykułu jest teoretyczna i praktyczna analiza zachowania się algorytmów przetwarzania zbiorów zapytań eksploracyjnych w kontekście różnych ścieżek dostępu do danych.
Damian Mierzwiński, Marek Wojciechowski: 'Materializowane perspektywy eksploracyjne w praktyce: Wnioski z implementacji w Oracle 11g', Materiały XVI konf. PLOUG, Kościelisko, 2010. (pdf, BibTeX)
Materializowane perspektywy są powszechnie wykorzystywane we współczesnych systemach zarządzania bazami danych, w szczególności do skrócenia czasów wykonywania zapytań analitycznych w hurtowniach danych. W ostatnich latach w literaturze naukowej pojawiły się propozycje rozszerzenia zastosowań perspektyw materializowanych o wybrane techniki eksploracji danych. Artykuł poświęcony jest prototypowej implementacji w systemie Oracle 11g jednego z zaproponowanych w literaturze podejść do wykorzystania zmaterializowanych wyników poprzednich zapytań eksploracyjnych w kontekście odkrywania zbiorów częstych.
Piotr Jedrzejczak, Marek Wojciechowski: 'Integrated Candidate Generation in Processing Batches of Frequent Itemset Queries Using Apriori', Proc. of the 2nd International Conference on Knowledge Discovery and Information Retrieval, Valencia, Spain, 2010. (pdf, BibTeX)
Frequent itemset mining can be regarded as advanced database querying where a user specifies constraints on the source dataset and patterns to be discovered. Since such frequent itemset queries can be submitted to the data mining system in batches, a natural question arises whether a batch of queries can be processed more efficiently than by executing each query individually. So far, two methods of processing batches of frequent itemset queries have been proposed for the Apriori algorithm: Common Counting, which integrates only the database scans required to process the queries, and Common Candidate Tree, which extends the concept by allowing the queries to also share their main memory structures. In this paper we propose a new method called Common Candidates, which further integrates processing of the queries from a batch by performing integrated candidate generation.
Marek Wojciechowski: 'A Survey and Comparative Analysis of the Methods of Processing Sets of Frequent Itemset Queries', Technical Report RA-15/10, Poznan University of Technology, 2010.
Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. This report is devoted to the problem of processing sets of frequent itemset queries, which brings the ideas of multiple-query optimization to the domain of data mining. Several methods of solving the mentioned problem have been proposed so far, differing in the level of shared computations among the queries and the frequent itemset mining algorithms to which they are applicable. In this report we review the most important methods of processing sets of frequent itemset queries and compare them according to various criteria.
Piotr Jedrzejczak, Marek Wojciechowski: 'Data Access Paths in Processing of Sets of Frequent Itemset Queries', Proc. of the 19th International Symposium on Methodologies for Intelligent Systems (ISMIS 2011), Warsaw, Poland, LNAI 6804, © Springer, 2011. (pdf, BibTeX)
Frequent itemset mining can be regarded as advanced data-base querying where a user specifies the dataset to be mined and constraints to be satisfied by the discovered itemsets. One of the research directions influenced by the above observation is the processing of sets of frequent itemset queries operating on overlapping datasets. Several methods of solving this problem have been proposed, all of them assuming selective access to the partitions of data determined by the overlapping of queries, and tested so far only on flat files. In this paper we theoretically and experimentally analyze the influence of data access paths available in database systems on the methods of frequent itemset query set processing, which is crucial from the point of view of their possible applications.
Marek Wojciechowski: 'Dostosowywanie wyglądu aplikacji ADF Faces', Materiały XVII konf. PLOUG, Kościelisko, 2011. (pdf, BibTeX)
ADF Faces to opracowana przez firmę Oracle w ramach frameworka Oracle Application Development Framework (ADF) biblioteka komponentów graficznych do budowy interfejsu użytkownika w aplikacjach opartych o technologię JavaServer Faces (JSF). Charakterystyczne dla JSF, a w związku z tym również dla ADF Faces, komponentowe podejście do budowy stron WWW stanowiących interfejs aplikacji odcina programistę od wynikowego kodu HTML, a przez to ogranicza możliwości wpływania na wygląd aplikacji poprzez arkusze stylów CSS. Aby zaoferować twórcom aplikacji większą niż przewidziana w ramach technologii JSF kontrolę nad szatą graficzną aplikacji, ADF Faces udostępnia mechanizm tzw. skórek, umożliwiających precyzyjne określenie formatowania poszczególnych komponentów strony. Celem artykułu jest przedstawienie zasady działania skórek, sposobu ich tworzenia i wykorzystywania w aplikacjach, a także wsparcia narzędzi Oracle dla tworzenia i edycji skórek.
Jacek Pospieszyński, Marek Wojciechowski: 'BigTable na tle systemów relacyjnych baz danych', Raport Instytutu Informatyki Politechniki Poznańskiej RB-10/12, 2012.
Marek Wojciechowski, Maciej Zakrzewicz, Pawel Boinski: 'Integration of Dataset Scans in Processing Sets of Frequent Itemset Queries', Data Mining: Foundations and Intelligent Paradigms, Volume 1: Clustering, Association and Classification, Springer, 2012. (Info, BibTeX)
Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. In this chapter we address the problem of processing sets of frequent itemset queries, which brings the ideas of multiple-query optimization to the domain of data mining. The most attractive method of solving the problem with respect to possible practical applications is Common Counting which consists in concurrent execution of the queries using Apriori with the integra-tion of scans of the parts of the database shared among the queries. The major advantage of Common Counting over its alternatives is its applicability to arbitrarily large batches of queries. If the memory structures of all the queries to be processed by Common Counting do not fit together in main memory, the set of queries has to be partitioned into subsets processed in several phases. We formalize the problem of dividing the set of queries for Common Counting as a specific case of hypergraph partitioning and provide a comprehensive overview of query set partitioning algorithms proposed so far.
Monika Rokosik, Marek Wojciechowski: 'Przetwarzanie strumieni zapytań eksploracyjnych dla algorytmu Apriori', Raport Instytutu Informatyki Politechniki Poznańskiej RB-02/2013, 2013.
Monika Rokosik, Marek Wojciechowski: 'Processing of Streams of Frequent Itemset Queries', New Trends in Database and Information Systems II, Selected papers of the 18th East European Conference on Advances in Databases and Information Systems and Associated Satellite Events, ADBIS 2014, Ohrid, Macedonia, AISC 312, © Springer, 2015. (pdf, BibTeX)
Frequent itemset mining is one of fundamental data mining problems that shares many similarities with traditional database querying. Hence, several query optimization techniques known from database systems have been successfully applied to frequent itemset queries, including reusing results of previous queries and multi-query optimization. In this paper, we consider a new problem of processing of streams of incoming frequent itemset queries, where like in multi-query optimization a number of queries are executed together and share some of their operations, but unlike in previously considered scenarios, new queries are dynamically being added to the currently processed set of queries.
Michał Kleszcz, Marek Wojciechowski: 'Nowe algorytmy partycjonowania hipergrafu dla problemu przetwarzania zbiorów zapytań eksploracyjnych', Raport Instytutu Informatyki Politechniki Poznańskiej RB-4/14, 2014.
Szymon Dolata, Marek Wojciechowski: 'Integracja drzew prefiksowych w przetwarzaniu zbiorów zapytań eksploracyjnych algorytmem Apriori', Raport Instytutu Informatyki Politechniki Poznańskiej RB-5/14, 2014.
Łukasz Wojtczak, Marek Wojciechowski: 'Sentiment analysis of comments published on the internet', Technical Report RA-12/16, Poznan University of Technology, 2016.
Pawel Glawinski, Marek Wojciechowski, Maciej Zakrzewicz: 'Anomaly Detection Service for Financial Data Streams', Proc. of CGW Workshop '17, Krakow, Poland, 2017.
Agnieszka Koperska, Marek Wojciechowski: 'Odkrywanie zbiorów częstych w bioinformatyce', Raport Instytutu Informatyki Politechniki Poznańskiej RB-2/18, 2018.
Maciej Zakrzewicz, Marek Wojciechowski, Pawel Glawinski: 'Solution Pattern for Anomaly Detection in Financial Data Streams', New Trends in Databases and Information Systems, ADBIS 2019 Short Papers, Workshops BBIGAP, QAUCA, SemBDM, SIMPDA, M2P, MADEISD, and Doctoral Consortium, Bled, Slovenia, CCIS 1064, © Springer, 2019. (pdf, BibTeX)
Anomaly detection in versatile financial data streams is a vital business problem. Existing IT solutions for business anomaly detection usually rely on explicit Complex Event Processing or near-real time Business Activity Monitoring. In this paper we argue that business anomaly detection should be considered an implicit infrastructural BPM service and we propose a corresponding Solution Pattern. We describe how a Business Anomaly Detector can be architectured and designed in order to handle fast dynamic streams of business objects in BPM environments. The presented solution has been practically verified in Oracle SOA/BPM Suite environment which handled real-life financial controlling business processes.

(Complete list of publications by category)
(List of my publications at DBLP Bibliography Server)
(Conference presentations, tutorials, seminars, ...)

Home | e-mail: Marek.Wojciechowski@cs.put.poznan.pl