Cloud-optimize your scientific data
without copying it
Virtual Zarr enables performant, cloud-optimized access to archival data formats like netCDF and HDF5 — without duplicating any data.
The Ecosystem
Three powerful tools working together to bring cloud-native workflows to your existing data archives.
VirtualiZarr
Create virtual Zarr stores from archival data formats using a familiar xarray API. Supports netCDF4, HDF5, FITS, and more.
Icechunk
A transactional storage engine for Zarr. Commit virtual references with version control, time travel, and distributed writes.
earthaccess
Search, download, or stream NASA Earth science data with just a few lines of code. Seamlessly integrates with virtual datacube workflows.
Why Virtual Zarr?
Unlock cloud-native performance for your legacy scientific data without the hassle of data migration.
Faster Processing
Analyze a year of TEMPO data in 10 minutes instead of hours. Virtual references enable efficient parallel access.
No Data Duplication
Create virtual datacubes that reference existing files. No need to copy or convert terabytes of archival data.
Works with Archives
Access netCDF, HDF5, and other legacy formats as if they were cloud-optimized Zarr stores.
Familiar Workflow
Use the xarray and Python tools you already know. Virtual Zarr integrates seamlessly with your existing code.
Standing on the Shoulders of Giants
Virtual Zarr builds on decades of work in scientific data formats, remote data access, and computer science fundamentals.
OPeNDAP
Pioneered remote data access and the DMR++ metadata format
HDF Group
Chunk-level access and the foundations of scientific data storage
Kerchunk
Originated the concept of virtual Zarr references
fsspec
Python filesystem abstraction enabling cloud-native access
Contributors
Virtual Zarr is made possible by ASDC, ASF, CarbonPlan, Development Seed, Earthmover, GES DISC, LP DAAC, NASA Earthdata, NSIDC, OB.DAAC, Openscapes, ORNL DAAC, PO.DAAC, and the Data Systems Evolution team at NASA Marshall Space Flight Center.