Yesterday, on Earth Day, IBM and the European Space Agency (ESA) announced the launch of TerraMind, a generative AI foundation model designed to analyze, interpret, and anticipate planetary dynamics using multimodal geospatial data.
This launch is part of FAST-EO (Foundation Models for Advanced Space-based Earth Observation), a European initiative led by a leading consortium—bringing together DLR (German Aerospace Center), Forschungszentrum Jülich, IBM Research Europe, and KP Labs—with the scientific and financial support of ESA's Φ-lab, the innovation lab dedicated to Earth sciences.
The goal of FAST-EO is to democratize access to foundation models within the Earth observation (EO) community and encourage their adoption in high-stakes areas—sustainable natural resource management, biodiversity preservation, climate disaster prevention, and agro-environmental systems analysis.
It is within this framework that TerraMind was developed. The model was pre-trained at Forschungszentrum Jülich on "TerraMesh," the most extensive geospatial dataset ever assembled. This corpus includes over 9 million samples covering nine distinct modalities: optical and radar imagery from Copernicus Sentinel-1 and -2 satellites, textual representations of the environment, geomorphology, and historical climate data.
Based on a symmetric transformer encoder-decoder architecture, TerraMind can simultaneously process pixel, token, and sequence inputs. It can, for example, cross-reference vegetation cover dynamics with past meteorological trends and land use descriptions to identify emerging risks or model an ecosystem's evolution.
A Breakthrough Innovation: Thinking-in-Modalities (TiM)
Beyond its ability to process a massive volume of heterogeneous data, TerraMind introduces a methodological advance: Thinking-in-Modalities (TiM). According to its developers, it is the first truly generative and multimodal foundation model applied to Earth observation. This approach allows it to autonomously generate artificial data in case of missing inputs—a common situation in remote sensing due to cloud cover, variable sensor resolution, or temporal gaps in observation series.
The originality of the process lies in contextual reasoning between modalities. Inspired by the chains of thought used in LLMs, the TiM mechanism enables the model to combine, extrapolate, and reconstruct data from learned correlations between images, texts, physical or geographical variables. During fine-tuning or inference, this ability to enrich a partial context not only improves the model's robustness but also refines its responses in specific situations.
The application of this technique to issues such as water scarcity prediction—which involves diverse variables such as climate, land use, vegetation, hydrography, and agricultural practices—illustrates its operational potential, where traditional approaches faced data silos or temporal gaps.
Optimized Efficiency
Despite its scale—over 500 billion tokens used during the training phase—TerraMind is a particularly efficient model. Thanks to its architecture and effective representation compression, it consumes ten times fewer resources than comparable models on similar tasks. This differential opens up concrete deployment prospects on a large scale, even in environments constrained by computing or connectivity capacities.
It is also the most performant. TerraMind was evaluated by the ESA on PANGAEA, a standard benchmark of the community: it outperformed 12 popular Earth observation foundation models by 8% or more on real tasks, such as land cover classification, change detection, environmental monitoring, and multi-sensor and multi-temporal analysis.
The model is part of IBM's strategy for climate and environmental AI, complementing the IBM-NASA Prithvi and Granite models. Its availability on IBM Geospatial Studio and Hugging Face enhances its accessibility and interoperability.
According to Nicolas Longepe, Earth Observation Data Scientist at ESA:
"This project is a perfect example of successful collaboration between the scientific community, leading tech companies, and experts to harness the potential of technology for Earth sciences. The synergy between Earth observation data experts, Machine learning specialists, data scientists, and high-performance computing (HPC) engineers is magical."
To better understand
What is Thinking-in-Modalities (TiM) and how does it work in the TerraMind model?
Thinking-in-Modalities (TiM) is an innovative approach that allows TerraMind to generate artificial data when data is missing by combining information from different modalities such as images and texts. It is inspired by the chains of thought used in LLMs to contextualize and extrapolate data based on learned correlations.