EVOLUTION and VERSIONING in SEMANTIC DATA INTEGRATION SYSTEMS |
---|
Guest Editors |
---|
- Ladjel Bellatreche, National Engineering School for Mechanics and Aerotechnics, France
- Robert Wrembel, Poznań University of Technology, Poland
Topics of Interest include (but are not limited to the following) |
---|
- User requirement evolution issues for semantic integration systems
- Ontology evolution for semantic integration systems
- Ontology evolution & ETL
- Ontology evolution issues for virtual architectures
- Ontology evolution issues for physical design of integration systems
- Incremental selection of optimization structures
- Ontology evolution & versioning
- Inconsistency in evolving semantic information
- Incremental reasoning
- Query languages and OLAP tools for evolving data warehouses
- Integrity constraints for evolving data structures
- Indexing temporal and multiversion data
- Metadata management and querying
- Temporal and evolving ontologies
- Self-organization in integration systems
- Evolution of data and ontology issues on reconciliation, fusion, and alignment
- Quality of evolving data
- Security issues in evolving integration systems
- Case studies and applications of ontology evolution
- Tools to support evolving data and ontologies
- Self-tuning, self-adjusting, and self-healing integration systems
Submission information |
---|
Each paper for submission should be formatted according to the rules available at Instructions for Authors. The final submission must be in LaTeX and it should include the original source (including all style files and figures) and a PDF version of the compiled output. Submissions should be sent to:
Programm Committee (not complete yet) |
---|
- Alberto Abello, Universitat Politecnica de Catalunya, Spain
- Yamine Ait Ameur, ENSEEIHT Toulouse, France
- Sonia Bergamaschi, Universitŕ di Modena e Reggio Emilia, Italy
- Omar Boussaid, Lyon 2 University, France
- Dickson K.W. Chiu, The Chinese University of Hong Kong, China
- Oscar Corcho, Universidad Politécnica de Madrid, Spain
- Alfredo Cuzzocrea, ICAR-CNR and University of Calabria, Italy
- Johann Eder, University of Klagenfurt, Austria
- Ahmad Ghazal, Teradata, USA
- Marcin Gorawski, Silesian University of Technology, Poland
- Jorge Gracia, Universidad Politécnica de Madrid, Spain
- Stéphane Jean, Poitiers University, France
- Haridimos Kondylakis, Foundation of Research & Technology-Hellas (FORTH), Grece
- Jens Lechtenboerger, University of Muenster, Germany
- Brahim Medjahed, University of Michigan - Dearborn, USA
- Mukesh K Mohania, IBM India Research Lab, India
- Carlos Ordonez, University of Huston, USA
- Sudha Ram, University of Arizona, USA
- Chantal Reynaud, Université Paris XI and INRIA Saclay - Ile-de-France, France
- Alkis Simitsis, HP Labs, USA
- David Taniar, Monash University, Australia
- Juan Carlos Trujillo Mondéjar, Universidad de Alicante, Spain
- Leandro Krug Wives, Universidade Federal do Rio Grande do Sul, Brasil
Overview |
---|
Data integration systems aim at integrating data from multiple heterogeneous, distributed, autonomous, and evolving data sources (DSs) to provide a uniform access interface to end users. Typically, integration systems are based on the three following architectures: materialized (where data sources are duplicated in a repository), virtual (where data are kept in their sources), and hybrid. A data warehouse (DW) dedicated for business applications is a good example of the materialized architecture. A DW includes different components: an DSs layer, and extraction-transformation-loading (ETL) layer, a DW layer, and an on-line analytical processing (OLAP) layer. In the virtual architecture, a special component, called a mediator, provides an integrated view (a global schema) on the source schemas. User queries are expressed in terms of the global schema. A mediator provides a virtual database, translates user queries into specific queries on DSs, synthesizes the results of these queries, and returns answers to a user.
One of the main difficulties of building data integration systems is the heterogeneity of data sources. The semantics of data sources is usually implicit or unknown. Most DSs participating in the integration process were designed to satisfy day-to-day applications and not to be integrated in the future. Often, the small amount of semantic contained in their conceptual models is lost, since only their logical models are implemented and used by applications. The presence of a conceptual model may allow designers to express the application requirements and domain knowledge in an intelligible form for a user. Thus, its absence or any other semantic representation in final databases makes their interpretation and understanding complicated, even for designers who have good knowledge of the application domain. The heterogeneity of data sources impacts both the structure and the semantic. To deal with semantic problems and ensure an automatic data integration, a large number of research studies propose the use of ontologies to describe the semantic of various sources in data warehousing and mediator architectures. Ontologies showed their efficiency in materialized and virtual data integration systems. Recently, database community proposes solutions for building semantic DWs from sources referencing domain ontologies.
Methods used for designing semantic integration systems, research developments, and most of the commercially available technologies tacitly assumed that a semantic integration system is static. In practice, however, this assumption turned out to be false. A semantic integration system requires changes among others as the result of: (1) the evolution of DSs, (2) changes of the real world represented in an integration system, (3) the evolution of domain ontologies referencing sources and local ontology of the semantic integration system, (4) new user requirements, and (5) creating simulation scenarios (what-if analysis). As reported in the literature, structures of data sources change frequently. For example, the Wikipedia schema changed every 9-10 days on the average during the last 4 years. From our experience, schemas of EDSs may change even more frequently. For example, telecommunication data sources changed their schemas every 7-13 days, on the average. Banking data sources are more stable but they changed their schemas every 2-4 weeks, on the average. Changes in the structures of DSs impact all the layers in the semantic integration system. Since such changes are frequent, developing a technology for handling them automatically or semi-automatically in a semantic integration system is of high practical importance.
Existing approaches to handling the evolution of a integration system in general and data warehousing in particular can be categorized as: (1) ETL evolution and (2) warehouse evolution and (3) the evolution of optimization structures. ETL evolution have not received many attention from the research community so far. The most advanced approach, i.e., Hecateus, only partially solves the problem. Still open issues in this area concern: modeling ETL workflows, designing taxonomy and rules for ETL evolution, deploying these rules, and plugging in the evolution techniques into existing ETL engines.
The most eligible approaches to handling a semantic integration system evolution are based either on schema evolution techniques, or on temporal extensions, or on versioning. Schema evolution techniques are able to represent only the current integration schema and data, i.e. historical integration schema states are lost. Temporal extensions are able to handle multiple historical states of data but they manage one invariant global schema. In versioning techniques, a integration evolution is managed partially by means of schema versions and partially by data versions. The versioning techniques, although the most promising, still need development. They posses limited capabilities of querying integration versions, and few index structures for multiversion data have been developed. Moreover, there is a need to integrate temporal extensions (for managing data evolution) and versioning techniques into one consistent framework.
The selection of optimization structures during the physical design (materialized views, caching solutions, partitioning, indexing, parallelization, etc.) is usually done in a static way. The evolution of different components of semantic integration systems has an important impact on the final optimization structures that needs to be studied.
The aim of this special issue of the Journal of Data Semantics is twofold. First, to present new and challenging issues in the managing evolution and versioning in semantic integration systems. Second, to present the current research and technological developments in this field.