Issue |
E3S Web Conf.
Volume 297, 2021
The 4th International Conference of Computer Science and Renewable Energies (ICCSRE'2021)
|
|
---|---|---|
Article Number | 01069 | |
Number of page(s) | 10 | |
DOI | https://doi.org/10.1051/e3sconf/202129701069 | |
Published online | 22 September 2021 |
A generic metadata management model for heterogeneous sources in a data warehouse
1 LIMA laboratory,Ibn Zohr University, Agadir, Avenue Tamsoult, Morocco
2 AS laboratory, Abdelmalek Essaadi University, Al hoceima, Morocco
* Corresponding author: o.oukhouya@uiz.ac.ma
For more than 30 decades, data warehouses have been considered the only business intelligence storage system for enterprises. However, with the advent of big data, they have been modernized to support the variety and dynamics of data by adopting the data lake as a centralized data source for heterogeneous sources. Indeed, the data lake is characterized by its flexibility and performance when storing and analyzing data. However, the absence of schema on the data during ingestion increases the risk of the transformation of the data lake into a data swamp, so the use of metadata management is essential to exploit the data lake. In this paper, we will present a conceptual metadata management model for the data lake. Our solution will be based on a functional architecture of the data lake as well as on a set of features allowing the genericity of the metadata model. Furthermore, we will present a set of transformation rules, allowing us to translate our conceptual model into an owl ontology.
© The Authors, published by EDP Sciences, 2021
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.