AIM and SCOPE
During the last two decades data mining has shown tremendous development and has become a distinct and mature discipline. Numerous methods have been proposed to discover different representations of knowledge from data and the number of their applications in various fields is continuously growing. Nevertheless, many of current approaches are focused on using a single learning algorithm on a static data layout.
In many domains such a simplistic strategy appears to be too restrictive. Modern automatic systems are able to collect huge volumes of data, often with a complex structure. This fact poses new challenges for current information systems with respect to storing, managing or processing data. It also highlights the need for new scalable algorithmic solutions allowing for data summarizing, sampling and approximating.
This need is particularly visible in the emerging domain of stream data mining, where large volumes of data records are generated continuously. The amounts of data arriving at a high rate, often with dynamically changing characteristics, require real-time or near-real-time analysis and introduce constraints to the available amount of operating memory. This makes particularly interesting the algorithms, which are not only scalable but also can adapt to the concept drifts.
Such issues are considered by experts coming from different fields such as SQL and non-SQL databases, data mining, machine learning, statistics and engineering. We believe it is time to enhance communication between these communities. The aim of this workshop is to gather researchers interested in discussing and introducing new algorithmic foundations and application aspects of mining real world difficult data.
TOPICS of INTEREST
We invite all researchers and practitioners who are developing algorithms, systems, and applications, to share their results, ideas, and experiences. Besides the main topic covering complex data streams, related aspects of efficient processing of massive data and knowledge discovery from large databases are also welcomed.
Suggested topics include (but are not limited to) the following:
- Integration of different, heterogeneous or distributed data sources
- Pre-processing, structuring and organizing complex data
- Handling complex values and complex patterns in data
- Mining text, web, multimedia, semi-structured, and graph data
- Scalability in processing large data volumes
- Sampling techniques for massive data
- Approximate processing
- Near-real-time analytics
- Handling machine-generated data
- Integrating parallel data streams
- Classification, clustering and frequent patterns from data streams
- Detecting and adapting to changes and concept drift in evolving data
- Incremental online learning algorithms
- Knowledge discovery from ubiquitous environments
- Privacy preserving in knowledge discovery from data
- Applications, especially in scientific data, medicine, text processing, web mining, image or multimedia analysis, sensor networks, bio-informatics