Resources

Data is a key resource for creating successful machine learning applications. In particular, benchmark datasets are necessary, which is being addressed in the Earth sciences [147]. The availability of open source models and source code is also very helpful, to learn about previous efforts, and avoid reinventing the wheel. Finally, it is important to interact with a community of fellow practitioners, to find advice and potential partnerships.

Some helpful resources in these regards are listed below, in alphabetical order.

Climate Change AI

Web page: https://www.climatechange.ai/

Climate Change AI is an initiative that aims to catalyze impactful work, at the intersection of AI (especially machine learning) and climate change. In a recently refreshed version of the open access paper ‘Tackling climate change with machine learning’ [1], the authors collectively catalog and rank many concrete opportunities for ML to make a tangible difference in the quest for climate change mitigation and adaption, in electricity systems, transportation, buildings and cities, industry, farms and forests, carbon dioxide removal, and other domains.

ClimateSet

Web page: https://climateset.github.io

Climate models are important tools for analyzing climate change, and predicting its future impacts. ClimateSet [148] assembles a collection of inputs and outputs from 36 climate models from Input4MIPs and CMIP6, and makes them available in a conveniently preprocessed form for large-scale ML applications. It includes inputs and outputs for five SSP scenarios, four forcing agents, and two climatic variables: temperature and precipitation.

Earth on AWS

Web page: https://aws.amazon.com/earth/

Amazon Web Services (AWS) is a very large cloud computing platform, which provides computational resources of many different kinds. The platform hosts copies of several important Open Data datasets, including data from the Sentinel and Landsat satellite missions. In addition, Amazon has an open Call for Proposals for research using the Earth on AWS datasets for building scientific applications, and successful proposal can be granted cloud usage credits.

Earth System Science Data

Web page: https://www.earth-system-science-data.net/

Earth System Science Data (ESSD) is an international journal for the publication of articles on original and high-quality, well-documented research data, with a clear emphasis on open access [149]. ESSD also supports a ‘living data’ process to support evolving datasets, which are subject to regular updates or extensions.

Google Earth Engine

Web page: https://earthengine.google.com/

Google Earth Engine is an online platform for geospatial analysis and computations, which is provided free of charge for academic and research use. It contains several petabytes of curated datasets, including satellite data from USGS/NASA and ESA, all accessible through a common API. The portal contains a development environment in which source code and interactive maps are displayed side-by-side.

Hugging Face

Web page: https://huggingface.co/

Hugging Face fosters an AI community, which builds, trains and deploys state of the art models, using open source tools in machine learning. The portal contains a repository of models, a collection of datasets, a set of demonstration apps, and a suite of collaboration tools.

Kaggle

Web page: https://www.kaggle.com/

Kaggle is an online platform for hosting machine learning competitions, which has built a large community of data scientists and ML practitioners. It is open to any type of data, and contains many Earth-related entries, such as the ‘Understanding Clouds from Satellite Images’ competition, by the Max Planck Institute for Meteorology, the ‘LANL Earthquake EDA and Prediction’ competition, by Los Alamos National Laboratory, and others. It is also a place where datasets are catalogued, so as to be found more easily by the community. For instance, it hosts the ‘Climate Change: Earth Surface Temperature Data’ from Berkeley Earth.

ML4Earth

Web page: https://ml4earth.de

Machine Learning for Earth observation (ML4Earth) is a German national center of excellence, led by the Technical University of Munich. Among its activities, ML4Earth maintains a collection of benchmark data products, which consist in pre-labeled EO datasets and baseline/pre-trained AI models. This enables a researcher get up and running more quickly, when tackling a new EO task. The EarthNets platform [150] maintained by ML4Earth contains a categorization of over 400 EO datasets.

Pangeo

Web page: https://pangeo.io

Pangeo is a community working collaboratively to develop software and infrastructure, in order to facilitate research in Big Data geoscience. The shared objective is to build an ecosystem of mutually compatible, open source geoscience software packages, following established best practices in the scientific python community.

Radiant MLHub

Web page: https://mlhub.earth/

The Radiant Earth Foundation is a non-profit foundation, whose goal is to increase the positive impact of Earth Observation through machine learning. The Radiant MLHub brings together training data, models, and a community of participants with backgrounds in EO, geospatial data, and machine learning.

Sentinel Hub

Web page: https://www.sentinel-hub.com/

Satellite data from many Earth Observation programs, such as the Sentinel missions (ESA/Copernicus) and Landsat missions (NASA/USGS), are available at Sentinel Hub under a unified API. The website also contains a graphical ‘EO Browser’ which allows a visual exploration of datasets that are available for a given time span and geographical extent.

SpaceML

Web page: https://spaceml.org/

SpaceML is a machine learning toolbox, and a developer community that builds open science AI apps, for space science and exploration. It is part of the Frontier Development Lab (FDL), supported by NASA, DOE, and ESA.

WeatherBench 2

Web page: https://sites.research.google/weatherbench

In order to facilitate global weather forecasting with ML, in particular for the medium-range timeframe (1-14 days), Google Research has published the WeatherBench framework (now version 2). The framework enables the evaluation and comparison of various weather forecasting models, using open-source evaluation code. It also contains ground-truth and baseline datasets [151].

References

[1]
D. Rolnick et al., “Tackling Climate Change with Machine Learning,” ACM Computing Surveys, vol. 55, no. 2, pp. 42:1–42:96, Feb. 2022, doi: 10.1145/3485128.
[147]
P. D. Dueben, M. G. Schultz, M. Chantry, D. J. Gagne, D. M. Hall, and A. McGovern, “Challenges and Benchmark Datasets for Machine Learning in the Atmospheric Sciences: Definition, Status, and Outlook,” Artificial Intelligence for the Earth Systems, vol. 1, no. 3, Jul. 2022, doi: 10.1175/AIES-D-21-0002.1.
[148]
J. Kaltenborn et al., ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning,” Advances in Neural Information Processing Systems, vol. 36, pp. 21757–21792, Dec. 2023.
[149]
D. Carlson and T. Oda, “Editorial: Data publication ESSD goals, practices and recommendations,” Earth System Science Data, vol. 10, no. 4, pp. 2275–2278, Dec. 2018, doi: 10.5194/essd-10-2275-2018.
[150]
Z. Xiong, F. Zhang, Y. Wang, Y. Shi, and X. X. Zhu, EarthNets: Empowering AI in Earth Observation.” arXiv, Dec. 2022. doi: 10.48550/arXiv.2210.04936.
[151]
S. Rasp et al., WeatherBench 2: A benchmark for the next generation of data-driven global weather models,” Jan. 26, 2024. http://arxiv.org/abs/2308.15560 (accessed Feb. 20, 2024).