JODS Special Issue

EVOLUTION and VERSIONING in SEMANTIC DATA INTEGRATION SYSTEMS

Guest Editors

Ladjel Bellatreche, National Engineering School for Mechanics and Aerotechnics, France
Robert Wrembel, Poznań University of Technology, Poland

Topics of Interest include (but are not limited to the following)

User requirement evolution issues for semantic integration systems
Ontology evolution for semantic integration systems
Ontology evolution & ETL
Ontology evolution issues for virtual architectures
Ontology evolution issues for physical design of integration systems
Incremental selection of optimization structures
Ontology evolution & versioning
Inconsistency in evolving semantic information
Incremental reasoning
Query languages and OLAP tools for evolving data warehouses
Integrity constraints for evolving data structures
Indexing temporal and multiversion data
Metadata management and querying
Temporal and evolving ontologies
Self-organization in integration systems
Evolution of data and ontology issues on reconciliation, fusion, and alignment
Quality of evolving data
Security issues in evolving integration systems
Case studies and applications of ontology evolution
Tools to support evolving data and ontologies
Self-tuning, self-adjusting, and self-healing integration systems

Submission information

Each paper for submission should be formatted according to the rules available at
Instructions for Authors. The final submission must be in LaTeX and it should include the original source (including all style files and figures) and a PDF version of the compiled output.
Submissions should be sent to:

Programm Committee (not complete yet)

Alberto Abello, Universitat Politecnica de Catalunya, Spain
Yamine Ait Ameur, ENSEEIHT Toulouse, France
Sonia Bergamaschi, Universitŕ di Modena e Reggio Emilia, Italy
Omar Boussaid, Lyon 2 University, France
Dickson K.W. Chiu, The Chinese University of Hong Kong, China
Oscar Corcho, Universidad Politécnica de Madrid, Spain
Alfredo Cuzzocrea, ICAR-CNR and University of Calabria, Italy
Johann Eder, University of Klagenfurt, Austria
Ahmad Ghazal, Teradata, USA
Marcin Gorawski, Silesian University of Technology, Poland
Jorge Gracia, Universidad Politécnica de Madrid, Spain
Stéphane Jean, Poitiers University, France
Haridimos Kondylakis, Foundation of Research & Technology-Hellas (FORTH), Grece
Jens Lechtenboerger, University of Muenster, Germany
Brahim Medjahed, University of Michigan - Dearborn, USA
Mukesh K Mohania, IBM India Research Lab, India
Carlos Ordonez, University of Huston, USA
Sudha Ram, University of Arizona, USA
Chantal Reynaud, Université Paris XI and INRIA Saclay - Ile-de-France, France
Alkis Simitsis, HP Labs, USA
David Taniar, Monash University, Australia
Juan Carlos Trujillo Mondéjar, Universidad de Alicante, Spain
Leandro Krug Wives, Universidade Federal do Rio Grande do Sul, Brasil

Overview

Data integration systems aim at integrating data from multiple heterogeneous, distributed, autonomous, and evolving data sources (DSs) to provide a uniform access interface to end users. Typically, integration systems are based on the three following architectures: materialized (where data sources are duplicated in a repository), virtual (where data are kept in their sources), and hybrid. A data warehouse (DW) dedicated for business applications is a good example of the materialized architecture. A DW includes different components: an DSs layer, and extraction-transformation-loading (ETL) layer, a DW layer, and an on-line analytical processing (OLAP) layer. In the virtual architecture, a special component, called a mediator, provides an integrated view (a global schema) on the source schemas. User queries are expressed in terms of the global schema. A mediator provides a virtual database, translates user queries into specific queries on DSs, synthesizes the results of these queries, and returns answers to a user.

One of the main difficulties of building data integration systems is the heterogeneity of data sources. The semantics of data sources is usually implicit or unknown. Most DSs participating in the integration process were designed to satisfy day-to-day applications and not to be integrated in the future. Often, the small amount of semantic contained in their conceptual models is lost, since only their logical models are implemented and used by applications. The presence of a conceptual model may allow designers to express the application requirements and domain knowledge in an intelligible form for a user. Thus, its absence or any other semantic representation in final databases makes their interpretation and understanding complicated, even for designers who have good knowledge of the application domain. The heterogeneity of data sources impacts both the structure and the semantic. To deal with semantic problems and ensure an automatic data integration, a large number of research studies propose the use of ontologies to describe the semantic of various sources in data warehousing and mediator architectures. Ontologies showed their efficiency in materialized and virtual data integration systems. Recently, database community proposes solutions for building semantic DWs from sources referencing domain ontologies.

Methods used for designing semantic integration systems, research developments, and most of the commercially available technologies tacitly assumed that a semantic integration system is static. In practice, however, this assumption turned out to be false. A semantic integration system requires changes among others as the result of: (1) the evolution of DSs, (2) changes of the real world represented in an integration system, (3) the evolution of domain ontologies referencing sources and local ontology of the semantic integration system, (4) new user requirements, and (5) creating simulation scenarios (what-if analysis). As reported in the literature, structures of data sources change frequently. For example, the Wikipedia schema changed every 9-10 days on the average during the last 4 years. From our experience, schemas of EDSs may change even more frequently. For example, telecommunication data sources changed their schemas every 7-13 days, on the average. Banking data sources are more stable but they changed their schemas every 2-4 weeks, on the average. Changes in the structures of DSs impact all the layers in the semantic integration system. Since such changes are frequent, developing a technology for handling them automatically or semi-automatically in a semantic integration system is of high practical importance.

Existing approaches to handling the evolution of a integration system in general and data warehousing in particular can be categorized as: (1) ETL evolution and (2) warehouse evolution and (3) the evolution of optimization structures. ETL evolution have not received many attention from the research community so far. The most advanced approach, i.e., Hecateus, only partially solves the problem. Still open issues in this area concern: modeling ETL workflows, designing taxonomy and rules for ETL evolution, deploying these rules, and plugging in the evolution techniques into existing ETL engines.

The most eligible approaches to handling a semantic integration system evolution are based either on schema evolution techniques, or on temporal extensions, or on versioning. Schema evolution techniques are able to represent only the current integration schema and data, i.e. historical integration schema states are lost. Temporal extensions are able to handle multiple historical states of data but they manage one invariant global schema. In versioning techniques, a integration evolution is managed partially by means of schema versions and partially by data versions. The versioning techniques, although the most promising, still need development. They posses limited capabilities of querying integration versions, and few index structures for multiversion data have been developed. Moreover, there is a need to integrate temporal extensions (for managing data evolution) and versioning techniques into one consistent framework.

The selection of optimization structures during the physical design (materialized views, caching solutions, partitioning, indexing, parallelization, etc.) is usually done in a static way. The evolution of different components of semantic integration systems has an important impact on the final optimization structures that needs to be studied.

The aim of this special issue of the Journal of Data Semantics is twofold. First, to present new and challenging issues in the managing evolution and versioning in semantic integration systems. Second, to present the current research and technological developments in this field.

Call for Papers:::Journal on Data Semantics (JoDS)
Special Issue on
EVOLUTION and VERSIONING in SEMANTIC DATA INTEGRATION SYSTEMS

Key Dates

ADBIS 2012 Conference

About JoDS

Call for Papers:::Journal on Data Semantics (JoDS) Special Issue on EVOLUTION and VERSIONING in SEMANTIC DATA INTEGRATION SYSTEMS

Key Dates

ADBIS 2012 Conference

About JoDS

Call for Papers:::Journal on Data Semantics (JoDS)
Special Issue on
EVOLUTION and VERSIONING in SEMANTIC DATA INTEGRATION SYSTEMS