Introduction to STAC Part 1: An Overview of the Specification

last updated: May 02, 2024

The goal of this tutorial is to provide an overview to the STAC specification, a breakdown of each STAC component, and how they are integrated together.


What is STAC?

The Spatial Temporal Asset Catalog specification was designed to establish a standard, unified language to talk about geospatial data, allowing it to more easily searchable and queryable.

An overarching goal in having this common standard is to eliminate the need to puruse through APIs of many satellite providers in order to access all the needed data.

STAC is simple and extensible in its design due to how it is structured. STAC is a network of json files that reference other json files, with each json file adhering to a specific core specification depending on what STAC component it is describing. This core json format can also be customized to fit differing needs, making the STAC specification highly flexible and adaptable.


STAC Components

The four key components of STAC include items, catalogs, collections, and the STAC API. These components can be used in isolation from one another, but ideally work best in tandem.

STAC Item

A STAC item is the foundational building block of STAC. It is GeoJSON supplemented with additional metadata that enables clients to traverse through catalogs.

geojson.png

item_json.png

In [1]:
import pystac
In [2]:
print(pystac.Item.__doc__)
An Item is the core granular entity in a STAC, containing the core metadata
    that enables any client to search or crawl online catalogs of spatial 'assets' -
    satellite imagery, derived data, DEM's, etc.

    Args:
        id (str): Provider identifier. Must be unique within the STAC.
        geometry (dict): Defines the full footprint of the asset represented by this item,
            formatted according to `RFC 7946, section 3.1 (GeoJSON)
            <https://tools.ietf.org/html/rfc7946>`_.
        bbox (List[float] or None):  Bounding Box of the asset represented by this item using
            either 2D or 3D geometries. The length of the array must be 2*n where n is the
            number of dimensions. Could also be None in the case of a null geometry.
        datetime (datetime or None): Datetime associated with this item. If None,
            a start_datetime and end_datetime must be supplied in the properties.
        properties (dict): A dictionary of additional metadata for the item.
        stac_extensions (List[str]): Optional list of extensions the Item implements.
        href (str or None): Optional HREF for this item, which be set as the item's
            self link's HREF.
        collection (Collection or str): The Collection or Collection ID that this item
            belongs to.
        extra_fields (dict or None): Extra fields that are part of the top-level JSON properties
            of the Item.

    Attributes:
        id (str): Provider identifier. Unique within the STAC.
        geometry (dict): Defines the full footprint of the asset represented by this item,
            formatted according to `RFC 7946, section 3.1 (GeoJSON)
            <https://tools.ietf.org/html/rfc7946>`_.
        bbox (List[float] or None):  Bounding Box of the asset represented by this item using
            either 2D or 3D geometries. The length of the array is 2*n where n is the
            number of dimensions. Could also be None in the case of a null geometry.
        datetime (datetime or None): Datetime associated with this item. If None,
            the start_datetime and end_datetime in the common_metadata
            will supply the datetime range of the Item.
        properties (dict): A dictionary of additional metadata for the item.
        stac_extensions (List[str] or None): Optional list of extensions the Item implements.
        collection (Collection or None): Collection that this item is a part of.
        links (List[Link]): A list of :class:`~pystac.Link` objects representing
            all links associated with this STACObject.
        assets (Dict[str, Asset]): Dictionary of asset objects that can be downloaded,
            each with a unique key.
        collection_id (str or None): The Collection ID that this item belongs to, if any.
        extra_fields (dict or None): Extra fields that are part of the top-level JSON properties
            of the Item.
    

To learn more about STAC Item specifications: https://github.com/radiantearth/stac-spec/tree/master/item-spec


STAC Catalog

A Catalog is usually the starting point for navigating a STAC. A catalog.json file will contain contains links to some combination of other catalogs, collections, and/or items. This combination is quite variable and flexible depending on how the data is being organized. A catalog may only reference a group of items, it may link toother subcatalogsand no collections, or a combination of catalogs and collections, etc.

We can think of it like a directory tree on a computer.

catalog_json.png

STAC Catalog Relation and Media Types

Self: Absolute URL to where the given json file can be found online, if possible

Root: Root: URL to root catalog or collection

Parent: URL to a Parent STAC Specification (could be an item, catalog, collection)

Child: URL to a Child STAC Specification (item, catalog, collection)

catalog_links.png

In [4]:
print(pystac.Catalog.__doc__)
A PySTAC Catalog represents a STAC catalog in memory.

    A Catalog is a :class:`~pystac.STACObject` that may contain children,
    which are instances of :class:`~pystac.Catalog` or :class:`~pystac.Collection`,
    as well as :class:`~pystac.Item` s.

    Args:
        id (str): Identifier for the catalog. Must be unique within the STAC.
        description (str): Detailed multi-line description to fully explain the catalog.
            `CommonMark 0.28 syntax <http://commonmark.org/>`_ MAY be used for rich text
            representation.
        title (str or None): Optional short descriptive one-line title for the catalog.
        stac_extensions (List[str]): Optional list of extensions the Catalog implements.
        href (str or None): Optional HREF for this catalog, which be set as the catalog's
            self link's HREF.
        catalog_type (str or None): Optional catalog type for this catalog. Must
            be one of the values in :class`~pystac.CatalogType`.

    Attributes:
        id (str): Identifier for the catalog.
        description (str): Detailed multi-line description to fully explain the catalog.
        title (str or None): Optional short descriptive one-line title for the catalog.
        stac_extensions (List[str] or None): Optional list of extensions the Catalog implements.
        extra_fields (dict or None): Extra fields that are part of the top-level JSON properties
            of the Catalog.
        links (List[Link]): A list of :class:`~pystac.Link` objects representing
            all links associated with this Catalog.
        catalog_type (str or None): The catalog type, or None if not known.
    

To learn more about STAC Catalog specifications: https://github.com/radiantearth/stac-spec/tree/master/catalog-spec


STAC Collection

A STAC Collection builds upon the STAC Catalog specification to include additional metadata about a set of items that exist as part of the collection.

collection.png

In [5]:
print(pystac.Collection.__doc__)
A Collection extends the Catalog spec with additional metadata that helps
    enable discovery.

    Args:
        id (str): Identifier for the collection. Must be unique within the STAC.
        description (str): Detailed multi-line description to fully explain the collection.
            `CommonMark 0.28 syntax <http://commonmark.org/>`_ MAY be used for rich text
            representation.
        extent (Extent): Spatial and temporal extents that describe the bounds of
            all items contained within this Collection.
        title (str or None): Optional short descriptive one-line title for the collection.
        stac_extensions (List[str]): Optional list of extensions the Collection implements.
        href (str or None): Optional HREF for this collection, which be set as the collection's
            self link's HREF.
        catalog_type (str or None): Optional catalog type for this catalog. Must
            be one of the values in :class`~pystac.CatalogType`.
        license (str):  Collection's license(s) as a `SPDX License identifier
            <https://spdx.org/licenses/>`_, `various`, or `proprietary`. If collection includes
            data with multiple different licenses, use `various` and add a link for each.
            Defaults to 'proprietary'.
        keywords (List[str]): Optional list of keywords describing the collection.
        providers (List[Provider]): Optional list of providers of this Collection.
        properties (dict): Optional dict of common fields across referenced items.
        summaries (dict): An optional map of property summaries,
            either a set of values or statistics such as a range.
        extra_fields (dict or None): Extra fields that are part of the top-level JSON properties
            of the Collection.

    Attributes:
        id (str): Identifier for the collection.
        description (str): Detailed multi-line description to fully explain the collection.
        extent (Extent): Spatial and temporal extents that describe the bounds of
            all items contained within this Collection.
        title (str or None): Optional short descriptive one-line title for the collection.
        stac_extensions (List[str]): Optional list of extensions the Collection implements.
        keywords (List[str] or None): Optional list of keywords describing the collection.
        providers (List[Provider] or None): Optional list of providers of this Collection.
        properties (dict or None): Optional dict of common fields across referenced items.
        summaries (dict or None): An optional map of property summaries,
            either a set of values or statistics such as a range.
        links (List[Link]): A list of :class:`~pystac.Link` objects representing
            all links associated with this Collection.
        extra_fields (dict or None): Extra fields that are part of the top-level JSON properties
            of the Catalog.
    

To learn more about STAC Collection specifications: https://github.com/radiantearth/stac-spec/tree/master/collection-spec


Dynamic versus Static STAC Catalogs

STAC Catalogs can be static, by creating the json files and storing them either in local directories, on file servers, or stored on cloud services like Amazon Simple Storage Service (Amazon S3)or Google Cloud Storage.

This makes static STAC Catalogs highly portable, reliable, providing a solid foundation for building dynamic versions through the use of APIs.

STAC API

This leads us to STAC APIs, the last component of the STAC specification. A STAC API is a RESTful API specification for querying STAC catalogs in a dynamic way. It is designed with a standard set of endpoints for searching catalogs, collections, and items.

You can find details in the API specification here: https://github.com/radiantearth/stac-api-spec


Next Steps

In the next tutorial, we will demonstrate how to generate a simple, static STAC Catalog of some Planet imagery using PySTAC!