Inicio   >   Proyectos   >   DATAME

DataMe

Un Método de producción de software dirigido por modelos para el desarrollo de aplicaciones Big Data

Ámbito
Nacional
Fecha
01/01/2018 - 01/09/2021
Sector
  • Otros
  • Salud
Presupesto
99099
Finanaciador
Video

INFORMACIÓN DEL PROYECTO

DESCRIPCIÓN

The term Big Data is increasingly present in the development of software applications and services on different application areas such as health or digital economy. The term is usually associated to technological concerns, related with solutions that manage and physically store big volumes of data. This interpretation has caused a proliferation of isolated Big Data technological solutions, generating a huge data chaos. However, a high quality technological infrastructure is not enough if it lacks the suitable mechanisms to organize and extract value from the stored data.
This project focuses on analysing, formalizing and solving conceptual and methodological challenges that arise while developing applications and services based on Big Data in industrial environments. Starting from an ontology that describes this domain without ambiguities and applying conceptual model-driven software development (MDSD) principles, we propose a conceptual model-driven method for developing Big Data applications (DataME). The goal is defining precise and rigorous conceptual models that drive the development of Big Data applications and services in order to provide business value. This way, we introduce the enterprise perspective without focusing on technological parameters of performance and scalability. In order to define this method, we will face the next four relevant scientific challenges:
(C1) In order to ensure value, we must establish a precise conceptualization about which information is relevant for organizations. This step is usually avoided, leading to technological solutions that do not fit their needs. From the methodological point of view, aligning organizational goals with technological solutions is essential.
(C2) Retrieving relevant knowledge from big volumes of data is only feasible after solving the heterogeneity among the different data sources. However, this integration must ensure data quality to avoid the generation of incorrect knowledge. In order to address this challenge, we will propose a conceptual alignment strategy that ensures the quality of the integrated information and allows the usage of relevant information as a whole.
(C3) Detecting and selecting relevant knowledge from Big Data volumes is only possible with interaction mechanisms that allow the final user searching and accessing information easily and precisely. Identifying this kind of interactions also requires a conceptual model perspective in which domain concepts guide the data operations that will be able to give value to the expert user.
(C4) Ensuring results quality and precision requires automatic testing methods working on highly distributed environments. Without this quality, Big Data can become into “Wrong Data”, disrupting the knowledge obtained. In order to address this challenge, we see a huge potential in using the testing automation tool TESTAR (testar.org), a result of the European project FITTEST (Future Internet Testing).
The DataME method will provide a holistic solution to these four challenges. As a proof of industrial application, we will apply this method in the development of a Big Data application for the management of genomic data in several organizations of the field.

The term Big Data is increasingly present in the development of software applications and services on different application areas such as health or digital economy. The term is usually associated to technological concerns, related with solutions that manage and physically store big volumes of data. This interpretation has caused a proliferation of isolated Big Data technological solutions, generating a huge data chaos. However, a high quality technological infrastructure is not enough if it lacks the suitable mechanisms to organize and extract value from the stored data.
This project focuses on analysing, formalizing and solving conceptual and methodological challenges that arise while developing applications and services based on Big Data in industrial environments. Starting from an ontology that describes this domain without ambiguities and applying conceptual model-driven software development (MDSD) principles, we propose a conceptual model-driven method for developing Big Data applications (DataME). The goal is defining precise and rigorous conceptual models that drive the development of Big Data applications and services in order to provide business value. This way, we introduce the enterprise perspective without focusing on technological parameters of performance and scalability. In order to define this method, we will face the next four relevant scientific challenges:
(C1) In order to ensure value, we must establish a precise conceptualization about which information is relevant for organizations. This step is usually avoided, leading to technological solutions that do not fit their needs. From the methodological point of view, aligning organizational goals with technological solutions is essential.
(C2) Retrieving relevant knowledge from big volumes of data is only feasible after solving the heterogeneity among the different data sources. However, this integration must ensure data quality to avoid the generation of incorrect knowledge. In order to address this challenge, we will propose a conceptual alignment strategy that ensures the quality of the integrated information and allows the usage of relevant information as a whole.
(C3) Detecting and selecting relevant knowledge from Big Data volumes is only possible with interaction mechanisms that allow the final user searching and accessing information easily and precisely. Identifying this kind of interactions also requires a conceptual model perspective in which domain concepts guide the data operations that will be able to give value to the expert user.
(C4) Ensuring results quality and precision requires automatic testing methods working on highly distributed environments. Without this quality, Big Data can become into “Wrong Data”, disrupting the knowledge obtained. In order to address this challenge, we see a huge potential in using the testing automation tool TESTAR (testar.org), a result of the European project FITTEST (Future Internet Testing).
The DataME method will provide a holistic solution to these four challenges. As a proof of industrial application, we will apply this method in the development of a Big Data application for the management of genomic data in several organizations of the field.

Impacto

Desde el punto de vista científico, la contribución principal será el método que aglutine nuestras experiencias previas en el ámbito del
DSDM y las ontologías fundacionales, en un dominio tan atractivo como es el Big Data. El impacto esperado a nivel científico es un
método que reúna ambas características, que desde nuestro conocimiento no existe actualmente. Dicho método también tendrá un
marcado impacto industrial, ya que mejorará la calidad del software basado en Big Data en términos de valor de negocio para las
organizaciones.
El plan de difusión se articulará mediante una perspectiva científica e industrial. Respecto a la difusión científica, nuestro principal objetivo
es la presentación de los resultados del proyecto en conferencias internacionales del ámbito del desarrollo de sistemas de información y
el modelado conceptual. Ejemplos de dichas conferencias son CAiSE, RCIS, ER, ICSE, DEXA, MODELS, RE, ISD, y el taller DSDM a
nivel nacional entre otras. También es un objetivo de diseminación primordial publicar en revistas internacionales de alta difusión y
reconocido prestigio como: Information Systems Journal, Data and Knowledge Engineering, ACM e IEEE Transactions on Software
Engineering, ACM e IEEE Transactions on Information Systems, Information and Software Technology o SoSyM. En concreto, se espera
la publicación de un promedio de cinco artículos en dichas conferencias al año, un artículo en una revista JCR de alto impacto, así como
la participación por parte de los investigadores en workshops de temática relacionada. Destacar asimismo la organización por parte del
equipo investigador del congreso ER en el año 2017, referente internacional de alto prestigio en el área del modelado conceptual, en el
cual se planteará como tema de investigación principal el papel fundamental que debe jugar el modelado conceptual en el desarrollo de
soluciones Big Data.
Una de las principales acciones de difusión industrial del proyecto será la organización de jornadas científico-industriales en el marco de
las conferencias o workshops ya mencionados. Estas jornadas tendrán como objetivo, presentar los resultados del proyecto tanto a
consultoras tecnológicas como a empresas que realicen labores de I+D+i. Se espera la organización de al menos dos de estos eventos.
Al organizarse dichas jornadas en el contexto de conferencias internacionales, buscamos también la internacionalización de la difusión del
proyecto. El material proporcionado en las jornadas, así como el material científico relevante, será publicitado en un portal web del
proyecto para enfatizar la difusión al público general.
La transferencia de resultados de I+D es una meta que cubriremos en primer lugar, a través del desarrollo del caso práctico planteado. Ya
se ha contactado con empresas que han manifestado su interés en hacer uso del método para el desarrollo o mejora de sus sistemas de
información para la gestión de datos genómicos. A través de dichos contactos, existen varias propuestas iniciales de transferencia de
resultados, por un lado, con la fundación INCLIVA para mejorar la gestión hospitalaria, y por el otro, con las empresas IMEGEN y
GemBiosoft para gestionar e integrar sus fuentes de conocimiento sobre diagnósticos genéticos. No obstante, no se descarta la
elaboración de proyectos europeos en el marco de Horizon 2020 o ITEA 3, o proyectos nacionales colaborativos, en los cuales se definen
planes de transferencia adicionales a los planteados en el presente proyecto.

Entidades

UPV

DATOS DE CONTACTO

Oscar Pastor López
Catedrático de Universidad (Universitat Politècnica de València)Director del Centro de Investigación en Métodos de Producción de Software (PROS)

VRAIN

CAPACIDADES TECNOLÓGICAS

IA
Big data analytics technologies