Saint George on a Bike: When Artificial Intelligence improves the understanding of artworks

21 mars 2022

The objective of the Saint George on a Bike (SGoaB) project is to improve the quality and quantity of open metadata associated with European cultural heritage images (CH). The Barcelona Supercomputing Center relies on deep learning to train object detection and image recognition models, complemented by natural language processing (NLP) to produce new or enhanced image captions for the Europeana collections.

The Saint George on a Bike project began on September 1, 2019 and will end on August 31. Researchers from the Barcelona Supercomputing Center and the Europeana Foundation are collaborating to help cultural heritage institutions automatically describe and classify their artworks.

The Europeana Foundation is an organization commissioned by the European Commission to develop a digital cultural heritage platform for Europe. On this platform, millions of cultural heritage items from about 4,000 institutions across Europe are available online.

To achieve the project’s goal, the researchers had to:

transcribe information about culture, symbols, and centuries of evolving iconographic traditions into a knowledge representation accessible to machine learning and artificial intelligence,
extend conventional deep learning approaches, focused on image recognition, with the ability to decipher the complex pictorial language that characterizes iconographic symbols and sacred imagery.

Artificial intelligence

SGoaB relies on deep learning to train object detection and image recognition models, complemented by natural language processing (NLP) and thus align the image content with the descriptive text, relying on both classical object detection and analysis of the pictorial semantics of the image.

The steps of the approach

Object detection:

The researchers first defined the object classes relevant to the iconography. Then, they trained CNNs (Convolutional Neural Network) by combining the object tagging frequency of open data sets with knowledge bases (DBpedia, Wikidata, Wikimedia Commons).

Then, they consider image segmentation using improved Mask-Region Convolutional Neural Network (Mask-RCNN) models (taking into account the painting style, the action or the represented patterns).

Subtitle generation:

The researchers generate the images using a pre-trained image extractor model, a sequence processor augmented with a layer of long and short term memory recurrent neural network (LSTM) augmented with a language model to infer the objects depicted. Finally, they used a decoder.

HPC architecture

High performance computing is essential for projects such as SGoaB where automatic image captioning requires the processing of large volumes of data. Augmenting image recognition processing with new images found in European iconography and NLP processing makes the task even more complex.

The Barcelona Supercomputing Center has the HPC infrastructure to support data and computationally intensive services, as well as read-write access capabilities for newly generated image datasets and metadata.

Attaching a good quality description to each digitized image should allow all users, including the visually impaired, to better understand the scope, nature, and relevance of a cultural heritage website’s content.
Maria-Cristina Marinescu, project coordinator, states:

“Our project will allow quick access to enriched cultural information, which can serve both cultural and social purposes, education, tourism, and possibly historians or anthropologists. Indirectly, citizens can benefit from better public services, when these are based on the idea that the richer metadata we produce offers – such as web accessibility for the visually impaired or stories that can expose social injustice or integration and gender issues through bodies of cultural heritage and help create a more tolerant European identity.”

Translated from Saint George on a Bike : Quand l’Intelligence Artificielle améliore la compréhension des œuvres d’art