Skip to content

open

lazycogs.open

open(href: str, *, datetime: str | None = None, bbox: tuple[float, float, float, float], crs: str | CRS, resolution: float, filter: str | dict[str, Any] | None = None, ids: list[str] | None = None, bands: list[str] | None = None, chunks: dict[str, int] | None = None, sortby: list[str] | None = None, nodata: float | None = None, dtype: str | dtype | None = None, mosaic_method: type[MosaicMethodBase] | None = None, time_period: str = 'P1D', store: ObjectStore | None = None, max_concurrent_reads: int = 32, path_from_href: Callable[[str], str] | None = None, duckdb_client: DuckdbClient | None = None) -> DataArray

Open a mosaic of STAC items as a lazy (time, band, y, x) DataArray.

Synchronous entry point. Works in both regular Python scripts and Jupyter notebooks. When called from inside a running event loop (e.g. a Jupyter kernel), the coroutine is dispatched to a background thread with its own event loop so the caller does not need await. Use :func:open_async directly if you are already in an async context and want to skip the thread overhead.

href must be a path to a geoparquet file (.parquet or .geoparquet) or, when duckdb_client is provided, to a hive-partitioned parquet directory.

Parameters:

Name Type Description Default
href str

Path to a geoparquet file (.parquet or .geoparquet) or a hive-partitioned parquet directory when duckdb_client is provided with use_hive_partitioning=True.

required
datetime str | None

RFC 3339 datetime or range (e.g. "2023-01-01/2023-12-31") used to pre-filter items from the parquet.

None
bbox tuple[float, float, float, float]

(minx, miny, maxx, maxy) in the target crs.

required
crs str | CRS

Target output CRS.

required
resolution float

Output pixel size in crs units.

required
filter str | dict[str, Any] | None

CQL2 filter expression (text string or JSON dict) forwarded to DuckDB queries, e.g. "eo:cloud_cover < 20".

None
ids list[str] | None

STAC item IDs to restrict the search to.

None
bands list[str] | None

Asset keys to include. If None, auto-detected from the first matching item.

None
chunks dict[str, int] | None

Chunk sizes passed to DataArray.chunk(). If None (default), returns a LazilyIndexedArray-backed DataArray where only the requested pixels are fetched on each access — ideal for point or small-region queries. Pass an explicit dict to convert to a dask-backed array for parallel computation over larger regions.

None
sortby list[str] | None

Sort keys forwarded to DuckDB queries.

None
nodata float | None

No-data fill value for output arrays.

None
dtype str | dtype | None

Output array dtype. Defaults to float32.

None
mosaic_method type[MosaicMethodBase] | None

Mosaic method class (not instance) to use. Defaults to :class:~lazycogs._mosaic_methods.FirstMethod.

None
time_period str

ISO 8601 duration string controlling how items are grouped into time steps. Supported forms: PnD (days), P1W (ISO calendar week), P1M (calendar month), P1Y (calendar year). Defaults to "P1D" (one step per calendar day), which preserves the previous behaviour. Multi-day windows such as "P16D" are aligned to an epoch of 2000-01-01.

'P1D'
store ObjectStore | None

Pre-configured obstore ObjectStore instance to use for all asset reads. Useful when credentials, custom endpoints, or non-default options are needed without relying on automatic store resolution from each HREF. When None (default), each asset URL is parsed to create or reuse a per-thread cached store.

None
max_concurrent_reads int

Maximum number of COG reads to run concurrently per chunk. See :func:open_async for full documentation. Defaults to 32.

32
path_from_href Callable[[str], str] | None

Optional callable (href: str) -> str that extracts the object path from an asset HREF. See :func:open_async for full documentation.

None
duckdb_client DuckdbClient | None

Optional DuckdbClient instance. When None (default), a plain DuckdbClient() is created. See :func:open_async for full documentation.

None

Returns:

Type Description
DataArray

Lazy xr.DataArray with dimensions (time, band, y, x).

Raises:

Type Description
ValueError

If href is not a .parquet or .geoparquet file and no duckdb_client is provided, if no matching items are found, or if time_period is not a recognised ISO 8601 duration.

lazycogs.open_async async

open_async(href: str, *, datetime: str | None = None, bbox: tuple[float, float, float, float], resolution: float, crs: str | CRS, filter: str | dict[str, Any] | None = None, ids: list[str] | None = None, bands: list[str] | None = None, chunks: dict[str, int] | None = None, sortby: list[str] | None = None, nodata: float | None = None, dtype: str | dtype | None = None, mosaic_method: type[MosaicMethodBase] | None = None, time_period: str = 'P1D', store: ObjectStore | None = None, max_concurrent_reads: int = 32, path_from_href: Callable[[str], str] | None = None, duckdb_client: DuckdbClient | None = None) -> DataArray

Open a mosaic of STAC items as a lazy (time, band, y, x) DataArray.

Async entry point, suitable for use with await in Jupyter notebooks and other async contexts. For synchronous scripts, use :func:open.

href must be a path to a geoparquet file (.parquet or .geoparquet) or, when duckdb_client is provided, to a hive-partitioned parquet directory.

Phase 0 work (runs at call time):

  1. Query the geoparquet index via DuckDB to discover bands and unique time steps (applying bbox, datetime, filter, and ids so the time axis contains no empty slices).
  2. Compute the output grid (affine transform + coordinate arrays).
  3. Create one StacBackendArray per band wrapped in a LazilyIndexedArray -- no pixel I/O yet.
  4. Assemble an xr.Dataset, convert to xr.DataArray, and optionally chunk with dask.

Parameters:

Name Type Description Default
href str

Path to a geoparquet file (.parquet or .geoparquet) or a hive-partitioned parquet directory when duckdb_client is provided with use_hive_partitioning=True.

required
datetime str | None

RFC 3339 datetime or range (e.g. "2023-01-01/2023-12-31") used to pre-filter items from the parquet.

None
bbox tuple[float, float, float, float]

(minx, miny, maxx, maxy) in the target crs.

required
crs str | CRS

Target output CRS.

required
resolution float

Output pixel size in crs units.

required
filter str | dict[str, Any] | None

CQL2 filter expression (text string or JSON dict) forwarded to DuckDB queries, e.g. "eo:cloud_cover < 20".

None
ids list[str] | None

STAC item IDs to restrict the search to.

None
bands list[str] | None

Asset keys to include. If None, auto-detected from the first matching item.

None
chunks dict[str, int] | None

Chunk sizes passed to DataArray.chunk(). If None (default), returns a LazilyIndexedArray-backed DataArray where only the requested pixels are fetched on each access — ideal for point or small-region queries. Pass an explicit dict to convert to a dask-backed array for parallel computation over larger regions.

None
sortby list[str] | None

Sort keys forwarded to DuckDB queries.

None
nodata float | None

No-data fill value for output arrays.

None
dtype str | dtype | None

Output array dtype. Defaults to float32.

None
mosaic_method type[MosaicMethodBase] | None

Mosaic method class (not instance) to use. Defaults to :class:~lazycogs._mosaic_methods.FirstMethod.

None
time_period str

ISO 8601 duration string controlling how items are grouped into time steps. Supported forms: PnD (days), P1W (ISO calendar week), P1M (calendar month), P1Y (calendar year). Defaults to "P1D" (one step per calendar day), which preserves the previous behaviour. Multi-day windows such as "P16D" are aligned to an epoch of 2000-01-01.

'P1D'
store ObjectStore | None

Pre-configured obstore ObjectStore instance to use for all asset reads. Useful when credentials, custom endpoints, or non-default options are needed without relying on automatic store resolution from each HREF. When None (default), each asset URL is parsed to create or reuse a per-thread cached store.

None
max_concurrent_reads int

Maximum number of COG reads to run concurrently per chunk. Items are processed in batches of this size, which bounds peak in-flight memory when a chunk overlaps many files. Methods that support early exit (e.g. the default :class:~lazycogs._mosaic_methods.FirstMethod) will stop reading once every output pixel is filled, so lower values also reduce unnecessary I/O on dense datasets. Defaults to 32.

32
path_from_href Callable[[str], str] | None

Optional callable (href: str) -> str that extracts the object path from an asset HREF. When provided, it replaces the default urlparse-based extraction used in :func:~lazycogs._store.resolve. Most useful when combined with a custom store whose root does not align with the URL path structure of the asset HREFs.

Example — NASA LPDAAC proxy https url for S3 asset::

from obstore.store import S3Store
from urllib.parse import urlparse

store = S3Store(bucket="lp-prod-protected", ...)

def strip_bucket(href: str) -> str:
    # href: https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/path/to/file.tif
    # store is rooted at the bucket, so the path is just path/to/file.tif
    return urlparse(href).path.lstrip("/").removeprefix("lp-prod-protected/")

da = lazycogs.open("items.parquet", ..., store=store, path_from_href=strip_bucket)
None
duckdb_client DuckdbClient | None

Optional DuckdbClient instance. When None (default), a plain DuckdbClient() is used, which is equivalent to the previous rustac.search_sync behaviour. Pass a custom client to enable features such as hive-partitioned datasets::

import rustac, lazycogs

client = DuckdbClient(use_hive_partitioning=True)
da = lazycogs.open(
    "s3://bucket/stac/",
    duckdb_client=client,
    bbox=...,
    crs=...,
    resolution=...,
)
None

Returns:

Type Description
DataArray

Lazy xr.DataArray with dimensions (time, band, y, x).

Raises:

Type Description
ValueError

If href is not a .parquet or .geoparquet file and no duckdb_client is provided, if no matching items are found, or if time_period is not a recognised ISO 8601 duration.