Home   >   Projects   >   DATAME

DataMe

A Model Driven Software Production Method for Big Data Application Development

Field
National
Date
01/01/2018 - 01/09/2021
Industry
  • Otros
  • Salud
Budget
99099
Funded by
Video

PROJECT INFORMATION

DESCRIPTION

The term Big Data is increasingly present in the development of software applications and services on different application areas such as health or digital economy. The term is usually associated to technological concerns, related with solutions that manage and physically store big volumes of data. This interpretation has caused a proliferation of isolated Big Data technological solutions, generating a huge data chaos. However, a high quality technological infrastructure is not enough if it lacks the suitable mechanisms to organize and extract value from the stored data.
This project focuses on analysing, formalizing and solving conceptual and methodological challenges that arise while developing applications and services based on Big Data in industrial environments. Starting from an ontology that describes this domain without ambiguities and applying conceptual model-driven software development (MDSD) principles, we propose a conceptual model-driven method for developing Big Data applications (DataME). The goal is defining precise and rigorous conceptual models that drive the development of Big Data applications and services in order to provide business value. This way, we introduce the enterprise perspective without focusing on technological parameters of performance and scalability. In order to define this method, we will face the next four relevant scientific challenges:
(C1) In order to ensure value, we must establish a precise conceptualization about which information is relevant for organizations. This step is usually avoided, leading to technological solutions that do not fit their needs. From the methodological point of view, aligning organizational goals with technological solutions is essential.
(C2) Retrieving relevant knowledge from big volumes of data is only feasible after solving the heterogeneity among the different data sources. However, this integration must ensure data quality to avoid the generation of incorrect knowledge. In order to address this challenge, we will propose a conceptual alignment strategy that ensures the quality of the integrated information and allows the usage of relevant information as a whole.
(C3) Detecting and selecting relevant knowledge from Big Data volumes is only possible with interaction mechanisms that allow the end user searching and accessing information easily and precisely. Identifying this kind of interactions also requires a conceptual model perspective in which domain concepts guide the data operations that will be able to give value to the expert user.
(C4) Ensuring results quality and precision requires automatic testing methods working on highly distributed environments. Without this quality, Big Data can become into "Wrong Data", disrupting the knowledge obtained. In order to address this challenge, we see a huge potential in using the testing automation tool TESTAR (testar.org), a result of the European project FITTEST (Future Internet Testing).
The DataME method will provide a holistic solution to these four challenges. As a proof of industrial application, we will apply this method in the development of a Big Data application for the management of genomic data in several organizations of the field.

The term Big Data is increasingly present in the development of software applications and services on different application areas such as health or digital economy. The term is usually associated to technological concerns, related with solutions that manage and physically store big volumes of data. This interpretation has caused a proliferation of isolated Big Data technological solutions, generating a huge data chaos. However, a high quality technological infrastructure is not enough if it lacks the suitable mechanisms to organize and extract value from the stored data.
This project focuses on analysing, formalizing and solving conceptual and methodological challenges that arise while developing applications and services based on Big Data in industrial environments. Starting from an ontology that describes this domain without ambiguities and applying conceptual model-driven software development (MDSD) principles, we propose a conceptual model-driven method for developing Big Data applications (DataME). The goal is defining precise and rigorous conceptual models that drive the development of Big Data applications and services in order to provide business value. This way, we introduce the enterprise perspective without focusing on technological parameters of performance and scalability. In order to define this method, we will face the next four relevant scientific challenges:
(C1) In order to ensure value, we must establish a precise conceptualization about which information is relevant for organizations. This step is usually avoided, leading to technological solutions that do not fit their needs. From the methodological point of view, aligning organizational goals with technological solutions is essential.
(C2) Retrieving relevant knowledge from big volumes of data is only feasible after solving the heterogeneity among the different data sources. However, this integration must ensure data quality to avoid the generation of incorrect knowledge. In order to address this challenge, we will propose a conceptual alignment strategy that ensures the quality of the integrated information and allows the usage of relevant information as a whole.
(C3) Detecting and selecting relevant knowledge from Big Data volumes is only possible with interaction mechanisms that allow the end user searching and accessing information easily and precisely. Identifying this kind of interactions also requires a conceptual model perspective in which domain concepts guide the data operations that will be able to give value to the expert user.
(C4) Ensuring results quality and precision requires automatic testing methods working on highly distributed environments. Without this quality, Big Data can become into “Wrong Data”, disrupting the knowledge obtained. In order to address this challenge, we see a huge potential in using the testing automation tool TESTAR (testar.org), a result of the European project FITTEST (Future Internet Testing).
The DataME method will provide a holistic solution to these four challenges. As a proof of industrial application, we will apply this method in the development of a Big Data application for the management of genomic data in several organizations of the field.

Impact

From the scientific point of view, the main contribution will be the method that brings together our previous experiences in the field of the
DSDM and foundational ontologies, in a domain as attractive as Big Data. The expected impact at the scientific level is a
method that combines both characteristics, which to the best of our knowledge does not currently exist. Such a method will also have a
industrial impact, as it will improve the quality of Big Data-based software in terms of business value for companies.
organizations.
The dissemination plan will be articulated through a scientific and industrial perspective. With regard to scientific dissemination, our main objective is to
is the presentation of the results of the project at international conferences in the field of information systems development and
conceptual modeling. Examples of such conferences are CAiSE, RCIS, ER, ICSE, DEXA, MODELS, RE, ISD, and the DSDM workshop a
national level among others. It is also a primary dissemination objective to publish in international journals with high circulation and
such as: Information Systems Journal, Data and Knowledge Engineering, ACM and IEEE Transactions on Software.
Engineering, ACM and IEEE Transactions on Information Systems, Information and Software Technology or SoSyM. Specifically, it is expected to
the publication of an average of five papers at such conferences per year, one paper in a high impact JCR journal, as well as
the participation of researchers in workshops on related topics. Also noteworthy is the organization by the
research team of the ER conference in 2017, an international benchmark of high prestige in the area of conceptual modelling, in the
which will be raised as the main research topic the fundamental role that conceptual modeling should play in the development of
Big Data solutions.
One of the main industrial dissemination actions of the project will be the organization of scientific-industrial conferences within the framework of
the conferences or workshops already mentioned. The aim of this conference will be to present the results of the project to the following people
technology consultancies as well as companies that carry out R&D&I work. At least two of these events are expected to be organized.
By organising these days in the context of international conferences, we are also seeking to internationalise the dissemination of the
project. The material provided at the conference, as well as relevant scientific material, will be publicized on a web portal of the
project to emphasize dissemination to the general public.
The transfer of R&D results is a goal that we will cover first, through the development of the case study. Ya
companies have been contacted and have expressed their interest in using the method for the development or improvement of their systems.
information for genomic data management. Through such contacts, there are several initial proposals for the transfer of
The results, on the one hand, with the INCLIVA foundation to improve hospital management, and on the other hand, with the companies IMEGEN and
GemBiosoft to manage and integrate your genetic diagnostics knowledge sources. However, it does not rule out the
development of European projects in the framework of Horizon 2020 or ITEA 3, or national collaborative projects, in which the following are defined
transfer plans in addition to those proposed in this project.

Entities

UPV

Contact information

Oscar Pastor López
University Professor (Universitat Politècnica de València)Director of the Research Center for Research in Software Production Methods (PROS)

VRAIN

Technological capabilities

IA
Big data analytics technologies