By: Sean Gillies on December 15 2022
While your area of study may be very complex, a simplified representation of it works best with Planet’s platform.
Introduction¶
Planet’s APIs use spatial information on both ends, input and output. You can filter data catalog search results by comparing items to an area of interest. You can point a SkySat at the region on Earth that you are studying. You can order products derived from these sources and have them clipped to spatial boundaries that you provide. Planet Data API search results are represented using GeoJSON features. This allows the footprints of, e.g., PlanetScope scenes, to be viewed on a map along with your area of interest.
The shapes of these areas of interest, regions, and boundaries could be simple as triangles and squares or could be less perfect and more intricate agricultural fields or watersheds. The complexity of these shapes can adversely impact your experience as much as their area does, adding time to your analysis . But don’t worry—this post will help you understand how to analyze and treat that complexity so that you can get the most out of Planet data.
What do we mean by complexity?¶
By and large, the shapes of features on Earth’s surface are represented in Planet APIs by GeoJSON geometry objects. A triangular shape on the ground, for example, is represented by a JSON object with two members. The first member is a coordinate array consisting of three unique points (pairs of numbers) plus the first point, repeated to explicitly close the coordinate sequence. The second member is type, which has the value Polygon.
{
"coordinates": [
[
[
-1.859709,
15.634462
],
[
-3.933013,
12.424525
],
[
-0.423789,
12.315333
],
[
-1.859709,
15.634462
]
]
],
"type": "Polygon"
}
By complexity of a shape, we mean the number of edge segments or the number of coordinate pairs that are the boundaries of the segments. A square polygon has five such coordinate pairs, which we will refer to as vertices. One coordinate pair is a vertex. More realistic polygons might have 100 vertices or more. A shape with more vertices is more complex than a shape with fewer. Imagine a copy of the triangle above, but with one additional vertex added midway along one side, collinear with the original points on that side. This new polygon has the same area as the original triangle and is more or less equivalent to the original as far as common algorithms are concerned. Its domain (internal connected region) is the same as the triangle’s domain. But it is more complex due to the extra vertex. Complexity increases with each additional vertex, and complexity comes with a cost.
Even more about complexity¶
Why did we say that the complexity of shapes adversely impacts your experience? The algorithm we use to implement Orders API clipping involves reducing your clipping shape into a set of triangles. The time it takes to compute this triangulation is on the order of n * log(n), where n is the number of vertices of the clipping shape. In other words, it takes about 15 times longer to clip using a 100-vertex shape compared to 10-vertex shape. And 13 times yet longer to scale from 100 to 1000 vertices. This is a fact of life for geographic information systems: even our best general-purpose geometry algorithms struggle with complex shapes.
Computing the bounding box of your shape—as is done in Data API search and Tasking API orders—takes time on the order of n, where n is again the number of vertices. Complexity affects these operations almost as much as it affects clipping. It takes about 10 times longer to compare a 100-vertex shape to Planet’s catalog compared to a 10-vertex shape.
The greatest impact on your experience, however, is due to the cost of copying and passing long arrays of coordinates throughout Planet’s systems. Data transfer, whether from disk to memory, or across a network, is orders of magnitude slower than operations in a computer’s CPU. Shape data volume increases with the number of shape vertices: this is a fact of life. The more complex your shapes, the slower your search, order, and subscription results.
What about the area of shapes? Is this property not significant? Planet’s catalog is spatially indexed. The time to find results increases only slightly with the area of your shape. The area of a shape relative to the ground resolution of Planet product pixels affects search response time mainly by yielding more results that must be serialized and sent back to you. The area of your shapes thus affects your experience, but it is a different problem from the one posed by shape complexity.
Planet APIs constrain the complexity of your shapes so that you don’t have the experience of being in computational limbo. Data volume is not directly correlated to the area of your shapes and so area is not explicitly limited. Data API search requests are limited to 1MB in size, which limits the number of vertices of search shapes in these requests. Orders and Subscriptions API clip shape vertices are limited.
However, the Orders or Subscriptions API clip tool is most valuable whenever you have areas of interest that are small compared to the size of Planet imagery scenes. Use it, because it reduces the time to apply other tools like bandmath and harmonization, and reduces the amount of data you need to download. The important thing is to manage the complexity of shapes that you provide in requests, using only as much as needed to get good results.
Managing complexity¶
Now that we’ve defined complexity and its impact, let’s consider how to manage complexity. Our goal with managing complexity is to create a simpler shape that covers the entire original area while increasing total area as little as possible.
Counting vertices¶
The first step is measuring the complexity of your area of interest. Specifically, this means counting the number of vertices. ArcGIS and QGIS allow this through their expression engines as !shape!.pointcount
and num_points($geometry)
, respectively. QGIS also exposes this as a derived attribute of all features. In PostGIS, you can use the function ST_NPoints
.
In a Python program you can count vertices using the Shapely library. Here is an example that uses recursion.
def vertex_count(obj) -> int:
""""Count the vertices of a GeoJSON-like geometry object.
Parameters
----------
obj: a GeoJSON-like mapping or an object that provides __geo_interface__
Returns
-------
int
"""
shp = shape(obj)
if hasattr(shp, "geoms"):
return sum(vertex_count(part) for part in shp.geoms)
elif hasattr(shp, "exterior"):
return vertex_count(shp.exterior) + sum(vertex_count(rng) for rng in shp.interiors)
else:
return len(shp.coords)
If your shape is within the vertex limit, you can use it as an Orders or Subscriptions API clip geometry with no modifications.
If your shape has fewer than about 20,000 vertices, you can use it for a Data API search with no modifications. This is because the Data API strictly limits search requests to 1MB of JSON text. You can assume each vertex, or coordinate pair, is represented in the GeoJSON by 40-46 bytes of text with each coordinate printed to 15 decimal places of precision (the default of OGR and QGIS). Given the 1MB limit and the amount of text required per vertex, this sets a rough upper limit of 20,000 vertices.
Reducing complexity¶
The recipe we suggest for reducing complexity is this: Dissolve internal boundaries in your area of interest. If that doesn’t bring your area of interest under the limit, compute the convex hull. *Buffer to compensate for concavities or use the latest in outer concavity algorithms. If your area of interest has too many vertices, the first thing to check is whether redundant interior vertices can be eliminated by dissolving parts of your area of interest into a larger whole. For example, your area of interest might be a set of adjacent districts, watersheds, or zones extracted from other imagery. These shapes may share edges and vertices along those edges. Consider this map of Rocky Mountain National Park’s (RMNP) wilderness patrol zones. These 25 zones share boundaries and have redundant vertices.
This area is interesting in that some of its boundaries follow natural features of the landscape and some are purely matters of real estate. It is several PlanetScope scenes high and about one scene wide. The 25 zones cover 1081 km² and are described by a total of 28,915 vertices. The vertices are spaced about 10-20 meters apart along the zone boundaries.
This area of interest is not suited for search or clipping in this form. It is too complex. QGIS’s dissolve geoprocessing tool can eliminate 24,302 of these vertices along the interior boundaries, yielding a single shape with 5613 vertices. ArcGIS also has a dissolve tool. If you’re managing areas of interest with PostGIS, you can use ST_UnaryUnion
. In Python, shapely provides unary_union
. All of these are capable of dissolving areas of interest before Planet searches, orders, or subscriptions are made.
An 80% reduction in the number of vertices without losing any area is a step in the right direction. You could search with this new shape, it is well under the 1 MB size limit, but 5613 is still beyond the vertex limits for other Planet APIs. Fortunately, we have more cards to play. We can use simpler approximations of our areas of interest.
Convex hull¶
The smallest convex set that contains a shape is called its convex hull. The boundary of the convex hull of your area of interest looks like an elastic band stretched around it. It is an excellent approximation for mostly convex shapes as it hugs (for example,headlands), but less excellent for shapes with concavities as it fills in (for example, bays).
A useful property of the convex hull is that it covers every point in your area of interest. When it comes to searching and clipping, this means that you will find every catalog item that you would have found with the shape of your complex area of interest and will get every bit of data in your order or subscription that you would get if you could use your exact area of interest. A convex hull is radically less complex than the shape it surrounds. In this example, it is only 43 vertices. 99.9% of the original complexity has been removed. Planet API requests using this simplified shape will be speedily processed.
On the other hand, a convex hull is necessarily larger in area than your area of interest. In the present example, we have a +18% increase in area. That translates to a small chance of false positive search results: finding a Planet scene which intersects a filled-in “bay,” but not any of the original zones. It also means you’ll download data in those same filled-in regions that you may not use. In this example, 191 km² more.
Convex hull, the sequel¶
It turns out that one way to get a better approximation of your area of interest is to use even more convex hulls. If you take the convex hull of each zone separately and then dissolve those 25 convex hulls, you get a shape that hugs the original boundary more tightly. 99.4% of the complexity has been removed while only increasing the area by 5.7% (62 km²). This union of hulls is a great fit for Data API searching and Orders and Subscriptions API clipping.
Simplified buffers¶
If your area of interest is dominated by concavities, the convex hull approach may not be useful at all. For example, the convex hull of an N or Z-shaped region is a full rectangle and maximizes the number of false positive search results and extra clipped data volume. In such cases, we’ve often recommended the approach of buffering and then simplifying areas of interest because the outcome can follow the outline of the original concave shape better than a convex hull can.
The difficulty with buffering and simplifying is that there are more parameters to adjust compared to the convex hull approximation. Success can vary greatly depending on these parameters and on the complexity of your area of interest.
By simplification, we mean removal of vertices from the boundary of a shape, leaving the most important ones. The Visvalingam–Whyatt algorithm (ref TODO) uses local area-based criteria for importance. The widely used Douglas-Peucker algorithm has global distance-based criteria.
Imagine the original vertices of the boundary of your area of interest and the line segments between them, and another smaller set of vertices and line segments which approximates that boundary. The Hausdorff distance is the distance that we would need to dilate each of these sets so that they mutually cover each other. Given a distance like this, the Douglas-Peucker algorithm finds a simplified set of vertices that has an equal Hausdorff distance. Both algorithms result in loss of boundary detail, erosion of convex corners, and filling of concave corners. Erosion of convex corners has an impact on Data API search and Orders and Subscriptions API clipping: searches may miss items in the catalog and pixels may be missing from delivered data. Thus, simplification must practically be preceded by buffering.
The shape buffering operations offered by ArcGIS, QGIS, PostGIS, and Shapely have parameters setting the number of buffer segments around shape corners and the distance of these segments from the original shape. The effect of buffering on complexity is somewhat complicated. A buffer operation is constructive and creates a new shape with an entirely new set of vertices. More vertices are necessarily created around corners of the original area of interest. If you buffer a point and specify five segments per quarter circle around the point (the QGIS default, it’s eight for PostGIS), you get a 20-sided polygon (an icosagon) with 20 vertices. For buffer distances that are large compared to the distance between vertices, you’ll observe that buffering reduces complexity. In the extreme, a buffer shape approaches that which you would get when buffering a point.
Such a large buffer is useless in practice. You’ll want to use buffer distances smaller than the size of the area of interest so as not to unnecessarily inflate its area. However, for buffer distances that are the same order of magnitude as the distance between original vertices, or smaller, you’ll observe that buffering tends to increase complexity. On convex corners, 1-10 extra vertices may be created. Inversely, some vertices may be removed at concave corners. For the relatively small buffer distances that we are likely to use, the net result is to increase complexity. Buffering the RMNP zones by a distance of 100 meters and using five segments per quarter-circle gives a shape which has 598 more vertices (+11%).
If you subsequently perform a Douglas-Peucker simplification (available from ArcGIS, PostGIS, QGIS, Python, and other software) with a tolerance less than or equal to the buffer distance, extra corner vertices will be removed and some non-corner vertices will also be removed. Using a tolerance greater than the buffer distance is likely to result in loss of some of your area of interest due to convex corner erosion.
The best practice for buffering, simplifying, and not creating too many new vertices is this: use a buffer distance larger than the distance between vertices of your original shape. simplify using a tolerance that is no larger than the buffer distance.
Outer concave hulls¶
The future of approximating the shapes of areas of interest which have both convexities and concavities is the outer concave hull. Version 3.11 of the GEOS library provides an outer concave hull algorithm. This feature is thus coming soon to QGIS, PostGIS, and Shapely 2.0.0.
Below is a short script which uses an unreleased version of Shapely (2.0rc1) to construct the concave hulls of Rocky Mountain National Park’s Wilderness Patrol Zones, dissolve them, and write the results to a GeoJSON file which can be opened and displayed in QGIS.
import json
import fiona
from shapely.geometry import mapping, shape
from shapely import concave_hull, unary_union
with fiona.open("Wilderness_Patrol_Zones.shp") as collection:
hulls = [concave_hull(shape(feat["geometry"]), ratio=0.4) for feat in collection]
dissolved_hulls = unary_union(hulls)
feat = dict(type="Feature", properties={}, geometry=mapping(dissolved_hulls))
with open(
"dissolved-concave-hulls.geojson",
"w",
) as f:
collection = json.dump({"type": "FeatureCollection", "features": [feat]}, f)
This concave hull hugs the area of interest more closely than the convex hull and still has the important property of an outer hull: every point in the original area of interest is covered.
Conclusion¶
When your area of interest is overly complex, there are tools available to handle the complexity and maximize the value of Planet's platform. Desktop GIS software and programming libraries can measure the complexity of your areas of interest and provide tools to help you simplify your shapes to meet the vertex limit of Planet APIs: convex hulls, simplified buffers, and concave hulls. Today, convex hulls and simplified buffers are the most accessible methods. In the future, concave hulls may make the less predictable simplified buffer approach obsolete.
Next Steps¶
Check out our new simplification helpers in Planet Explorer. Let us know of any interesting simplification workflows you’ve done by Tweeting us @PlanetDevs. Sign up for Wavelengths, the Planet Developer Relations newsletter, to get more information on the tech behind the workflows.