lazycogs vs odc-stac¶
lazycogs is structured very differently from odc-stac, here is a summary of the differences:
- odc-stac uses rasterio/GDAL for all i/o and warp operations, lazycogs uses async-geotiff and obstore for i/o and pyproj for warping
- odc-stac requires you to load all of the STAC items into memory in order to materialize the array view, lazycogs queries a stac-geoparquet file on disk during data reading operations only
This notebook compares run times for three operations using lazycogs and odc-stac.
Performance summary¶
Benchmarks run against odc-stac==0.5.2 on a laptop in a single run. Results will vary across machines and network conditions — treat these as indicative rather than definitive.
A note on chunking: each library is configured with the chunk strategy that suits its architecture. lazycogs uses chunks={"time": 1} with no spatial chunking, because it handles spatial windowing internally and reads only the pixels it needs regardless of chunk size. odc-stac uses chunks={"time": 1, "x": 512, "y": 512} because without spatial chunks, computing any spatial subset would require loading entire scenes. These configurations are not identical, but they represent the practical defaults a user would choose for each library.
| Operation | lazycogs | odc-stac |
|---|---|---|
| Load STAC items into memory | n/a | 10.68s |
| Initialize array | 3.22s | 44.95s |
| Extract point values (30 days) | 7.60s | 17.09s |
| Plot spatial subset (3 time steps) | 17.16s | 35.41s |
import time
from contextlib import contextmanager
from pathlib import Path
import odc.stac
import rustac
from odc.geo.geobox import GeoBox
from pyproj import Transformer
from pystac import Item
import lazycogs
@contextmanager
def timer(label="Task"):
t0 = time.perf_counter()
try:
yield
finally:
t = time.perf_counter() - t0
print(f"{label} took {t:.2f} seconds")
# define the AOI in/ a projection that is suitable for your analysis
dst_crs = "epsg:5070"
dst_bbox = (-700_000, 2_220_000, 600_000, 2_930_000)
resolution = 100
bands = ["red", "green", "blue"]
dtype = "int16"
# transform to epsg:4326 for STAC search
transformer = Transformer.from_crs(dst_crs, "epsg:4326", always_xy=True)
bbox_4326 = transformer.transform_bounds(*dst_bbox)
Query the STAC API and cache the results to parquet.
items_parquet = "/tmp/midwest_summer_2025_items.parquet"
if not Path(items_parquet).exists():
await rustac.search_to(
items_parquet,
href="https://earth-search.aws.element84.com/v1",
collections=["sentinel-2-c1-l2a"],
datetime="2025-06-01/2025-09-30",
bbox=bbox_4326,
limit=100,
)
Initialize the array with lazycogs
with timer("Initializing lazycogs array"):
lazycogs_da = await lazycogs.open_async(
items_parquet,
crs=dst_crs,
bbox=dst_bbox,
resolution=resolution,
time_period="P1D",
bands=bands,
dtype=dtype,
chunks={"time": 1},
)
lazycogs_da
Initializing lazycogs array took 3.22 seconds
<xarray.DataArray (band: 3, time: 121, y: 7100, x: 13000)> Size: 67GB
dask.array<array, shape=(3, 121, 7100, 13000), dtype=int16, chunksize=(3, 1, 7100, 13000), chunktype=numpy.ndarray>
Coordinates:
* band (band) <U5 60B 'red' 'green' 'blue'
* time (time) datetime64[s] 968B 2025-06-01 2025-06-02 ... 2025-09-30
* y (y) float64 57kB 2.22e+06 2.22e+06 2.22e+06 ... 2.93e+06 2.93e+06
* x (x) float64 104kB -7e+05 -6.998e+05 -6.998e+05 ... 5.998e+05 6e+05
Attributes:
_stac_backends: [StacBackendArray(band='red', shape=(121, 7100, 13000...
_stac_time_coords: TimeCoords([2025-06-01 … 2025-09-30], n=121)Now load the STAC items for odc-stac.
with timer("loading STAC items into memory"):
items = await rustac.search(items_parquet)
items = [Item.from_dict(item) for item in items]
loading STAC items into memory took 10.68 seconds
with timer("Initializing odc-stac array"):
odc_stac_da = odc.stac.load(
items,
bands=bands,
chunks={"time": 1, "x": 512, "y": 512},
groupby="solar_day",
geobox=GeoBox.from_bbox(
bbox=dst_bbox, crs=dst_crs, resolution=resolution, tight=True
),
use_overviews=True,
resampling="nearest", # only resampling method supported by lazycogs
)
odc_stac_da
Initializing odc-stac array took 44.95 seconds
<xarray.Dataset> Size: 68GB
Dimensions: (y: 7100, x: 13000, time: 122)
Coordinates:
* y (y) float64 57kB 2.93e+06 2.93e+06 ... 2.22e+06 2.22e+06
* x (x) float64 104kB -7e+05 -6.998e+05 ... 5.998e+05 6e+05
* time (time) datetime64[us] 976B 2025-06-01T17:00:18.294000 ... 20...
spatial_ref int32 4B 5070
Data variables:
red (time, y, x) uint16 23GB dask.array<chunksize=(1, 512, 512), meta=np.ndarray>
green (time, y, x) uint16 23GB dask.array<chunksize=(1, 512, 512), meta=np.ndarray>
blue (time, y, x) uint16 23GB dask.array<chunksize=(1, 512, 512), meta=np.ndarray>with timer("Extracting values for a point from the lazycogs array"):
vals = (
lazycogs_da.sel(x=299965, y=2653947, method="nearest")
.sel(time=slice("2025-06-01", "2025-06-30"))
.compute()
)
vals
Extracting values for a point from the lazycogs array took 7.60 seconds
<xarray.DataArray (band: 3, time: 30)> Size: 180B
array([[ 0, 2578, 0, 2688, 0, 10193, 2548, 0, 11786,
0, 0, 8685, 0, 8705, 0, 0, 3066, 0,
11911, 0, 0, 8338, 0, 2598, 0, 8146, 11202,
0, 2503, 0],
[ 0, 2482, 0, 2521, 0, 10286, 2439, 0, 12660,
0, 0, 8963, 0, 8460, 0, 0, 3109, 0,
12683, 0, 0, 8437, 0, 2477, 0, 8041, 11309,
0, 2480, 0],
[ 0, 2326, 0, 2307, 0, 10744, 2210, 0, 14304,
0, 0, 9378, 0, 8434, 0, 0, 2949, 0,
14094, 0, 0, 8874, 0, 2227, 0, 8012, 11754,
0, 2258, 0]], dtype=int16)
Coordinates:
* band (band) <U5 60B 'red' 'green' 'blue'
* time (time) datetime64[s] 240B 2025-06-01 2025-06-02 ... 2025-06-30
y float64 8B 2.654e+06
x float64 8B 3e+05
Attributes:
_stac_backends: [StacBackendArray(band='red', shape=(121, 7100, 13000...
_stac_time_coords: TimeCoords([2025-06-01 … 2025-09-30], n=121)with timer("Extracting values for a point from the odc-stac array"):
vals = (
odc_stac_da.sel(x=299965, y=2653947, method="nearest")
.sel(time=slice("2025-06-01", "2025-06-30"))
.to_dataarray(dim="band")
.compute()
)
vals
/home/henry/workspace/hrodmn/lazycogs/.venv/lib/python3.13/site-packages/rasterio/warp.py:385: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned. dest = _reproject(
Extracting values for a point from the odc-stac array took 17.09 seconds
<xarray.DataArray (band: 3, time: 30)> Size: 180B
array([[ 0, 2578, 0, 2688, 0, 10193, 2548, 0, 9976,
0, 0, 8685, 0, 8705, 0, 0, 3066, 0,
11911, 0, 0, 8338, 0, 2598, 0, 8146, 11202,
0, 2731, 0],
[ 0, 2482, 0, 2521, 0, 10286, 2439, 0, 11075,
0, 0, 8963, 0, 8460, 0, 0, 3109, 0,
12683, 0, 0, 8437, 0, 2477, 0, 8041, 11309,
0, 2721, 0],
[ 0, 2326, 0, 2307, 0, 10744, 2210, 0, 12875,
0, 0, 9378, 0, 8434, 0, 0, 2949, 0,
14094, 0, 0, 8874, 0, 2227, 0, 8012, 11754,
0, 2586, 0]], dtype=uint16)
Coordinates:
* band (band) object 24B 'red' 'green' 'blue'
* time (time) datetime64[us] 240B 2025-06-01T17:00:18.294000 ... 20...
y float64 8B 2.654e+06
x float64 8B 3e+05
spatial_ref int32 4B 5070with timer("Plotting three time points for a small area with lazycogs"):
subset = lazycogs_da.sel(
x=slice(100_000, 400_000),
y=slice(2_600_000, 2_800_000),
).isel(time=slice(1, 4))
subset.plot.imshow(
rgb="band",
row="time",
vmin=0,
vmax=4000,
aspect=subset.shape[-1] / subset.shape[-2],
)
Estimated peak in-flight memory for bands=['red', 'green', 'blue'] is ~824 MB (12 concurrent reads × 3 bands × 3000x2000 px). Lower max_concurrent_reads or add spatial chunks to reduce memory use.
Plotting three time points for a small area with lazycogs took 17.16 seconds
with timer("Plotting three time points for a small area with odc-stac"):
subset = (
odc_stac_da.sel(
x=slice(100_000, 400_000),
y=slice(2_800_000, 2_600_000), # y axis is flipped
)
.isel(time=slice(1, 4))
.to_dataarray(dim="band")
)
subset.plot.imshow(
rgb="band",
row="time",
vmin=0,
vmax=4000,
aspect=subset.shape[-1] / subset.shape[-2],
)
Plotting three time points for a small area with odc-stac took 35.41 seconds