SystemX launches the third project of its AI research program on multi-source data exploitation

17 septembre 2021

SystemX, the technological research institute (IRT) dedicated to the digital engineering of the systems of the future, is launching the “Business Semantics for Multisource Data Mining” (SMD) project, the third project in its “Artificial Intelligence and Augmented Engineering” (IA2) research programme. This 48-month collaborative R&D project brings together several industrial and academic partners.

The aim of the initiative is to develop tools that hybridize symbolic AI and learning AI for the intelligence and knowledge management professions, in order to build and exploit knowledge on heterogeneous multi-source data, and to promote decision support in static or dynamic environments.

A project to address a specific problem on massive volumes of heterogeneous data

The origin of this project revolves around a topical issue: how to produce precise, relevant and useful knowledge in a highly targeted industrial or commercial context from massive volumes of heterogeneous data (video, image, text, speech, graphics, etc.) from multiple and varied sources? To answer this question, SystemX, along with Airbus Defence and Space SLC, Apsys, Ecosys, EDF, RTE and CentraleSupélec, decided to join forces to launch the SMD project. Sana Tmar, SMD project manager, IRT SystemX states:

“The context of the project starts from a real and recurring observation that any industrial company could face during its digital evolution the advent of a large volume of highly heterogeneous, unstructured and multisource data. With the increase in computing power of computers, we are increasingly able to retrieve information. However, these attributes used alone do not provide sufficient understanding comparable to that of a human being, hence the importance of coupling several approaches.”

Currently, companies have a multitude of data sources, often compartmentalized and containing information of a heterogeneous nature from a semantic, structural or syntactic point of view. The heterogeneity also lies in the systems and technologies used. The analysis of this data for decision-making purposes is often difficult to achieve, especially when it comes to cross-referencing this internal data with expert opinion and other external data (open data, web, social networks, etc.), and even more so when the processing must be carried out in real time.

5 use cases identified within the framework of the Business Semantics project for the exploitation of multisource data

The challenge of this project is to prototype the ability to integrate and analyze these very large volumes of unstructured and multisource heterogeneous data and to bring them together in a common environment in order to perform semantic processing and offer business professionals relevant, synthetic and interpretable new knowledge that will help them make informed decisions, such as identifying biases in business ontologies or detecting an atypical situation, for example.

Below are the five use cases that have been identified:

The evaluation of an artificial intelligence that will process in real time data flows (mainly videos) to detect on the fly situations presenting a risk. This AI will be intended for operators managing very dense and sometimes complex situations (rescue on large scale accidents for example). It will require a hybridization between the previous knowledge produced by experts in the field and the knowledge extracted by a Deep Learning type algorithm (Airbus Defence and Space SLC).
Digitizing behavioral models of industrial facilities (hazard analysis, vulnerabilities) and exploiting content to develop safety / security diagnostics in relation to regulatory compliance issues. This will involve analysing turns of phrase, failure chains and irregular communications in natural language to detect abnormal situations that could constitute a safety risk (Apsys).
The analysis of heterogeneous data generally defined in a business format and the ability to make the use autonomous in the creation of data models (graph ontologies), without the need for special expertise (Ecosys).
The redesign and formalization of requirements from heterogeneous data corpora: the need is to streamline the process of drafting tenders and checking responses in order to reduce costs and to allow experts to be more efficient, thanks to the hybridization of approaches, for a more efficient knowledge construction and for a process of discovering correspondences between ontologies (ontology alignment) which can be from different businesses and different languages (EDF).
The integration and coupling of artificial intelligence approaches (symbolic AI and classical AI) with the business data from which an ontology will be built with the aim of efficiently helping operators in the power network’s operating centres by anticipating particular situations in order to better manage them (period of intense activity/management of works/incident, for example, a line out of order) in a context of very strong change (RTE).

To conclude, Sana Tmar mentions the ambitions of the SMD project:

“The SMD project aims to remove an important lock regarding the hybridization of knowledge representation and reasoning approaches (symbolic AI) with recent artificial intelligence approaches (e.g. Deep Learning) for heterogeneous data analysis”

Translated from SystemX lance le troisième projet de son programme de recherche sur l’IA sur l’exploitation des données multi-sources