NASA Logo in the header
Data Science

The Data Science Group offers a unified approach to advance science through the application of advanced analytics including artificial intelligence and machine learning in a high-performance computing environment.

Meet Our People

Science as the Driver for Deep Learning

Data Enhancement & Scaling

Transforming raw satellite data into science-ready resources through innovative processing techniques and computational optimization

Science & Application Advances

Applying state-of-the-art models to earth, planetary, solar, astrophysics, and biological/physical science datasets to glean insights

Deep Learning Activities

Applying the most advanced proprietary and open source models to enormous science datasets using GSFC's High Performance Compute and cloud resources

Scientific Software

Develop custom software applications to advance specific science objectives

Deep learning techniques and modern era hardware resources have enabled better and faster science within our groups.

Learn about some of our featured projects below.

Background image

Weather Model for Mars (MarsCast)

Fine Tuning GraphCast, an Earth Weather Foundation model on a small set of Martian Climate Database proved that these models are generalizable to other planet's weather. The model reliably and accurately predicted multi-day temperatures at various atmospheric altitudes, learned physics properties, over diurnal and seasonal cycles.

Spaceborne VHR Image Processing Toolkit

The Enhanced Very High Resolution (EVHR) project uses a high-end computing environment to produce Top-of-Atmosphere (TOA) reflectance and digital elevation models. Using Digital Globe very-high-resolution imagery from NASA's ADAPT archive, EVHR produces one-half degree, orthorectified mosaics with estimates of surface reflectance and digital elevation models (DEMs).

NASA's 3-Billion Parameter Geospatial Foundation Model: SatVision-TOA

SatVision-TOA demonstrates the untapped potential of leveraging moderate- to coarse-resolution data for deep learning in Earth observation. By training a 3-billion-parameter vision transformer on a 100-million-image MODIS TOA dataset, it establishes a scalable, open-source foundation for advancing atmospheric science, cloud analysis, and Earth system modeling. Its released weights and workflows aim to broaden participation and foster collaboration in remote sensing applications. SatVision-TOA captures diverse atmospheric and surface conditions. Additionally, the model improves performance in 3D cloud retrieval and environmental monitoring, surpassing baseline methods.

The Discover Supercomputer

The centerpiece of the NCCS is the over 129,000-core "Discover" supercomputing cluster, an assembly of Linux scalable units capable of over 6.8 petaflops, or 6,800 trillion floating-point operations per second. Discover is particularly suited for large, complex, communications-intensive problems employing large matrices and science applications, which benefit from its ecosystem of software ecosystem.

The Discover Supercomputer

Science Managed Cloud Environment (SMCE)

The Science Managed Cloud Environment (SMCE) is an managed Amazon Web Service (AWS) based infrastructure for NASA funded projects that can leverage cloud computing capabilities.

While the SMCE was started to meet the needs of AIST projects, any NASA project that can leverage AWS public-cloud capabilities can get access to the SMCE.

Explore/ADAPT Science Cloud

Explore combines high-performance computing and virtualization technologies to create an on-site private cloud. This managed virtual machine (VM) environment is specifically designed for large-scale data analytics.

We work collaboratively to help you find solutions to data science and big data problems.

We support NASA initiatives encouraging open science while also preserving your intellectual property. We are a fully staffed team with eleven Innovation Lab Team members. When we are not supporting specific projects we are exploring new technologies and applying new techniques using existing use cases so that we can be ready to help advance projects when applicable.

Python Jupyter Pangeo Intake Dask Xarray