NASA Logo in the header
Data Science

The Data Science Group offers a unified approach to advance science through the application of advanced analytics including artificial intelligence and machine learning in a high-performance computing environment.

Meet Our People

Science as the Driver for Deep Learning.

Data Enhancement & Scaling

Transforming raw satellite data into science-ready resources through innovative processing techniques and computational optimization.

Science & Application Advances

Applying state-of-the-art models to earth, planetary, solar, astrophysics, and biological/physical science datasets to glean insights

Deep Learning Activities

Applying the most advanced proprietary and open source models to enormous science datasets using GSFC's High Performance Compute and cloud resources

Scientific Software

Develop custom software applications to advance specific science objectives

Deep learning techniques and modern era hardware resources have enabled better and faster science within our groups.

Learn about some of our featured projects below.

Background image
Spatial distribution of Theil-Sen slopes showing the rate of change in probabilities of climatic suitability per five-year interval across the 40-year time series. Positive trends are indicated in green, negative trends in red. Statistically significant positive and negative trends at the 95% confidence level are shown in dark green and red, respectively. Colored outlines indicate the northern extent of Cassin’s Sparrow’s U.S. breeding (red) and non-breeding (blue) ranges. Cassin’s Sparrow’s summer, breeding range encompasses all of the species’ winter, non-breeding range.

Cutting-edge models for Conservation: Ensemble machine learning advances ecological forecasting and reveals 40 years of changing climatic suitability for an aridland bird

Cassin’s Sparrow (Peucaea cassinii) is an elusive resident of the southwestern United States. Notable for its desert adaptations and distinctive skylarking display, the species offers important insights into how aridland birds respond to a changing climate. Using ensemble machine learning and spatial analysis applied to tens of thousands of eBird records together with NASA’s MERRA-2 reanalysis, NASA researchers documented shifts in climatic suitability for Cassin’s Sparrow across the past four decades. These shifts appear to be altering the timing of the species’ breeding cycle, suggesting that seasonal climatic change may be driving both behavioral and evolutionary responses. Beyond improving understanding of Cassin’s Sparrow’s natural history, this work significantly advances ecological forecasting methods for aridland birds and highlights important implications for biodiversity monitoring, conservation practice, and ecosystem health under climate change.

Enhanced Very-High Resolution project cover image

Spaceborne VHR Image Processing Toolkit

The Enhanced Very High Resolution (EVHR) project uses a high-end computing environment to produce Top-of-Atmosphere (TOA) reflectance and digital elevation models. Using Digital Globe very-high-resolution imagery from NASA's ADAPT archive, EVHR produces one-half degree, orthorectified mosaics with estimates of surface reflectance and digital elevation models (DEMs).

SatVision TOA workflow

NASA's 3-Billion Parameter Geospatial Foundation Model: SatVision-TOA

SatVision-TOA demonstrates the untapped potential of leveraging moderate- to coarse-resolution data for deep learning in Earth observation. By training a 3-billion-parameter vision transformer on a 100-million-image MODIS TOA dataset, it establishes a scalable, open-source foundation for advancing atmospheric science, cloud analysis, and Earth system modeling. Its released weights and workflows aim to broaden participation and foster collaboration in remote sensing applications. SatVision-TOA captures diverse atmospheric and surface conditions. Additionally, the model improves performance in 3D cloud retrieval and environmental monitoring, surpassing baseline methods.

The Discover Supercomputer

The centerpiece of the NCCS is the over 129,000-core "Discover" supercomputing cluster, an assembly of Linux scalable units capable of over 6.8 petaflops, or 6,800 trillion floating-point operations per second. Discover is particularly suited for large, complex, communications-intensive problems employing large matrices and science applications, which benefit from its ecosystem of software ecosystem.

The Discover Supercomputer

Science Managed Cloud Environment (SMCE)

The Science Managed Cloud Environment (SMCE) is an managed Amazon Web Service (AWS) based infrastructure for NASA funded projects that can leverage cloud computing capabilities.

While the SMCE was started to meet the needs of AIST projects, any NASA project that can leverage AWS public-cloud capabilities can get access to the SMCE.

Explore/ADAPT Science Cloud

Explore combines high-performance computing and virtualization technologies to create an on-site private cloud. This managed virtual machine (VM) environment is specifically designed for large-scale data analytics.

We work collaboratively to help you find solutions to data science and big data problems.

We support NASA initiatives encouraging open science while also preserving your intellectual property. We are a fully staffed team with eleven Innovation Lab Team members. When we are not supporting specific projects we are exploring new technologies and applying new techniques using existing use cases so that we can be ready to help advance projects when applicable.

Python Jupyter Pangeo Intake Dask Xarray