Lightweight GIS pipelines with fio-planet
By: Sean Gillies on April 27 2023
Planet’s new CLI has no builtin GIS capabilities other than what Planet’s APIs provide. To help make more sophisticated workflows possible, Planet is releasing a new package of command line programs for manipulating streams of GeoJSON features.
Making Command Line GIS Pipelines with fio-planet¶
Planet’s new command line interface (CLI) has no builtin GIS capabilities other than what Planet’s APIs provide. This is by design; it is meant to be complemented by other tools. To help make more sophisticated workflows possible, Planet is releasing a new package of command line programs, fio-planet, which let you build Unix pipelines for manipulating streams of GeoJSON features. Feature simplification before searching the Data API is one such application.
Planet’s new command line interface (CLI) has three main jobs. One is to help you prepare API requests. Planet’s APIs and data products have many options. The CLI provides tools to make specifying what you want easy. Another purpose is to make the status of those requests visible. Some requests take a few minutes to be fulfilled. The CLI permits notification when they are done, like the following command line output.
planet orders wait 65df4eb0-e416-4243-a4d2-38afcf382c30 && cowsay "your order is ready for download"
__________________________________
< your order is ready for download >
----------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Or you may prefer to leave cows alone and just download the order when it is ready.
planet orders wait 65df4eb0-e416-4243-a4d2-38afcf382c30 && planet orders download 65df4eb0-e416-4243-a4d2-38afcf382c30
The third purpose of the CLI is to print API responses in a form that lets them be used as inputs to other CLI programs such as jq, the streaming JSON filter. CLI commands, generally speaking, print newline-delimited streams of JSON objects or GeoJSON features.
By design, the CLI has limited options for manipulating JSON and relies heavily on jq. The CLI docs have many examples of usage in combination with jq. Similarly, the CLI outsources GIS capabilities to other programs. One option is the inscrutable and venerable ogr2ogr, but it is not as handy with streaming JSON or GeoJSON as jq is.
To help make sophisticated command line geo-processing workflows more accessible, Planet is releasing a package of command line programs that let you build Unix pipelines for manipulating streams of GeoJSON features. Feature simplification before searching the Data API is one of the many applications.
Processing GeoJSON with fio and fio-planet¶
The Python package Fiona includes a CLI designed around streams of GeoJSON features. The fio-cat command streams GeoJSON features out of one or more vector datasets. The fio-load command performs the reverse operation, streaming features into GIS file formats. Two Shapefiles with the same schema can be combined into a GeoPackage file using the Unix pipeline below.
fio cat example1.shp example2.shp | fio load -f GPKG examples.gpkg
Note: pipeline examples in this blog post assume a POSIX shell such as bash or zsh. The vertical bar | is a “pipe”. It creates two processes and connects the standard output stream of one to the standard input stream of the other. GeoJSON features pass through this pipe from one program to the other.
Planet’s new fio-planet package fits into this conceptual framework and adds three commands to the fio toolbox: fio-filter, fio-map, and fio-reduce. These commands provide some of the features of spatial SQL, but act on features in a GeoJSON feature sequence instead of rows in a spatial table. Each command accepts a JSON sequence as input, evaluates an expression in the context of the sequence, and produces a new JSON sequence as output. These may be GeoJSON sequences.
- fio-filter evaluates an expression for each feature in a stream of GeoJSON features, passing those for which the expression is true.
- fio-map maps an expression over a stream of GeoJSON features, producing a stream of new features or other values
- fio-reduce applies an expression to a sequence of GeoJSON features, reducing them to a single feature or other value.
In combination, many transformations are possible with these three commands.
Expressions take the form of parenthesized, comma-less lists which may contain other expressions. The first item in a list is the name of a function or method, or an expression that evaluates to a function. It’s a kind of prefix, or Polish notation. The second item is the function's first argument or the object to which the method is bound. The remaining list items are the positional and keyword arguments for the named function or method. The list of functions and callables available in an expression includes:
- Python builtins such as dict, list, and map
- From functools: reduce
- All public functions from itertools, for example, islice, and repeat
- All functions importable from Shapely 2.0, for example, Point, and unary_union
- All methods of Shapely geometry classes
Let’s look at some examples. Below is an expression that evaluates to a Shapely Point instance. Point is a callable instance constructor and the pair of 0 values are positional arguments. Note that the outermost parentheses of an expression are optional.
(Point 0 0)
Fio-planet translates that to the following Python expression.
from shapely import Point
Point(0, 0)
Next is an expression which evaluates to a Polygon, using Shapely’s buffer function. The distance parameter is computed from its own expression, demonstrating prefix notation once again.
(buffer (Point 0 0) :distance (/ 5 2))
The equivalent in Python is
from shapely import buffer
buffer(Point(0, 0), distance=5 / 2)
Fio-filter and fio-map evaluate expressions in the context of a GeoJSON feature and its geometry attribute. These are named f and g. For example, here is an expression that tests whether the distance from the input feature to the point at 0 degrees North and 0 degrees E is less than or equal to one kilometer (1,000 meters).
(<= (distance g (Point 0 0)) 1000)
fio-reduce evaluates expressions in the context of the sequence of all input geometries, which is named c. For example, the expression below dissolves all input geometries using Shapely's unary_union function.
(unary_union c)
Fio-filter evaluates expressions to true or false. Fio-map and fio-reduce will attempt to wrap expression results as GeoJSON feature objects unless raw results are specified. For more information about expressions and usage, please see the fio-planet project page.
Why does fio-planet use Lisp-like expressions for code that is ultimately executed by a Python interpreter? Fio-planet expressions are intended to allow a small subset of what you can do in GeoPandas, PostGIS, QGIS, or a bespoke Python program. By design, fio-planet’s expressions should help discriminate between workflows that can execute effectively on the command line using GeoJSON and workflows that are better executed in a more powerful environment. A domain-specific language (DSL), instead of a general purpose language, is intended to clarify the situation. Fio-planet’s DSL doesn’t afford variable assignment or flow control, for example. That’s supposed to be a signal of its limits. As soon as you feel fio-planet expressions are holding you back, you are right. Use something else!
The program ogr2ogr uses a dialect of SQL as its DSL. The QGIS expression language is also akin to a subset of SQL. Those programs tend to see features as rows of a table. Fio-planet works with streams, not tables, and so a distinctly different kind of DSL seemed appropriate. It adopted the Lisp-like expression language of Rasterio. Parenthesized lists are very easy to parse and Lisp has a long history of use in DSLs. Hylang is worth a look if you’re curious about a more fully featured Lisp for the Python runtime.
Simplifying shapes with fio-map and fio-reduce¶
Previously, the Planet Developers Blog published a deep dive into simplifying areas of interest for use with the Planet platform. The takeaway from that post is that it’s a good idea to simplify areas of interest as much as you can. Fio-planet brings methods for simplifying shapes and measuring the complexity of shapes to the command line alongside Planet’s CLI. Examples of using them in the context of the previous post are shown below. All the examples use a 25-feature shapefile. You can get it from rmnp.zip or access it in a streaming fashion as shown in the examples below.
Note: all examples assume a POSIX shell such as bash or zsh. Some use the program named jq. The vertical bar | is a “pipe”. It creates two processes and connects the standard output stream of one to the standard input stream of the other. The backward slash \ permits line continuation and allows pipelines to be typed in a more readable form.
A figure accompanies each example. The figures are rendered from a GeoJSON file by QGIS. The GeoJSON files are made by collecting, with fio-collect, the non-raw output of fio-cat, fio-map, or fio-reduce. For example, the pipeline below converts the zipped Shapefile on the web to a GeoJSON file on your computer.
fio cat zip+https://github.com/planetlabs/fio-planet/files/10045442/rmnp.zip \
| fio collect > rmnp.geojson
Counting vertices in a feature collection¶
The vertex_count function, in conjunction with fio-map's --raw
option, prints out the number of vertices in each feature. The default for fio-map is to wrap the result of every evaluated expression in a GeoJSON feature; --raw
disables this. The program jq provides a nice way of summing the resulting sequence of numbers. jq -s
“slurps” a stream of JSON objects into an array.
The following pipeline prints the number 28,915.
Input:
fio cat zip+https://github.com/planetlabs/fio-planet/files/10045442/rmnp.zip \
| fio map 'vertex_count g' --raw \
| jq -s 'add'
Output:
28915
Counting vertices after making a simplified buffer¶
One traditional way of simplifying an area of interest is to buffer it by some distance and then simplify it by a comparable distance. The effectiveness of this method depends on the nature of the data, especially the distance between vertices around the boundary of the area. There's no need to use jq in the following pipeline because fio-reduce prints out a sequence of exactly one value.
Input:
fio cat zip+https://github.com/planetlabs/fio-planet/files/10045442/rmnp.zip \
| fio reduce 'unary_union c' \
| fio map 'simplify (buffer g 40) 40' \
| fio map 'vertex_count g' --raw
Output:
469
Variable assignment and flow control are not provided by fio-planet’s expression language. It is the pipes between commands which afford some assignment and logic. For example, the pipeline above is practically equivalent to the following Python program.
import fiona
from fiona.transform import transform_geom
from shapely import buffer, shape, simplify, mapping, unary_union
with fiona.open("zip+https://github.com/planetlabs/fio-planet/files/10045442/rmnp.zip") as dataset:
c = [shape(feat.geometry) for feat in dataset]
g = unary_union(c)
g = shape(transform_geom("OGC:CRS84", "EPSG:6933", mapping(g)))
g = simplify(buffer(g, 40), 40)
g = shape(transform_geom("EPSG:6933", "OGC:CRS84", mapping(g)))
print(vertex_count(g))
Counting vertices after merging convex hulls of features¶
Convex hulls are an easy means of simplification. There are no distance parameters to tweak as there were in the example above. The --dump-parts
option of fio-map turns the parts of multi-part features into separate single-part features. This is one of the ways in which fio-map can multiply its inputs, printing out more features than it receives.
The following pipeline prints the number 157.
Input:
fio cat zip+https://github.com/planetlabs/fio-planet/files/10045442/rmnp.zip \
| fio map 'convex_hull g' --dump-parts \
| fio reduce 'unary_union c' \
| fio map 'vertex_count g' --raw
Output:
157
Counting vertices after merging the concave hulls of features¶
Convex hulls simplify, but also dilate concave areas of interest. They fill the "bays", so to speak, and this can be undesirable. Concave hulls do a better job at preserving the concave nature of a shape and result in a smaller increase of area.
The following pipeline prints the number 301.
Input:
fio cat zip+https://github.com/planetlabs/fio-planet/files/10045442/rmnp.zip \
| fio map 'concave_hull g 0.4' --dump-parts \
| fio reduce 'unary_union c' \
| fio map 'vertex_count g' --raw
Output:
301
Using fio-planet with the Planet CLI¶
Now for more specific examples of using fio-planet with Planet’s CLI on the command line. If you want to spatially constrain a search for assets in Planet’s catalog or clip an order to an area of interest, you will need a GeoJSON object. The pipeline below creates one based on the data used above and saves it to a file. Note that the fourth piece of the pipeline forces the output GeoJSON to be two-dimensional. Planet’s Data API doesn’t accept GeoJSON with Z coordinates.
fio cat zip+https://github.com/planetlabs/fio-planet/files/10045442/rmnp.zip \
| fio map 'concave_hull g 0.4' --dump-parts \
| fio reduce 'unary_union c' \
| fio map 'force_2d g' \
| fio collect \
> rmnp.geojson
Incorporating that GeoJSON object into a Data API search filter is the next step. The filter produced by the command shown below will find items acquired after Feb 14, 2023 that intersect with the shape of Rocky Mountain National Park.
planet data filter \
--date-range acquired gt 2023-02-14 \
--date-range acquired lt 2023-02-25 \
--geom=rmnp.geojson \
> filter.json
With a filter document, stored here in filter.json – the name of the filter doesn’t matter – you can make a filtered search and stream the GeoJSON results into a file. It’s a common practice to use .geojsons, plural, as the file extension for newline-delimited sequences of GeoJSON features.
planet data search PSScene --filter=filter.json > results.geojsons
With a filter document, stored here in filter.json – the name of the filter doesn’t matter – you can make a filtered search and stream the GeoJSON results into a file. It’s a common practice to use .geojsons, plural, as the file extension for newline-delimited sequences of GeoJSON features.
planet data search PSScene --filter=filter.json > results.geojsons
The fio-planet commands are useful for processing search results, too. The results.geojsons file contains a stream of GeoJSON features which represent some PSScene items in Planet’s catalog. Here’s an example of finding the eight scenes that cover 40.255 degrees North and 105.615 degrees West (the summit of Longs Peak in Rocky Mountain National Park).
Input:
cat results.geojsons \
| fio filter 'contains g (Point -105.615 40.255)' \
| jq '.id'
Output:
"20230222_172633_18_247f"
"20230221_165059_39_241b"
"20230221_165057_22_241b"
"20230220_171805_27_2251"
"20230219_172658_28_2461"
"20230217_172940_13_2474"
"20230216_171328_72_2262"
"20230214_173045_31_2489"
To further filter by the degree to which the ground is visible in each scene – which could also have been specified in the search filter, by the way – we can add a jq step to the pipeline.
cat results.geojsons \
| fio filter 'contains g (Point -105.615 40.255)' \
| jq -c 'select(.properties.visible_percent > 90)' \
| fio collect \
> psscenes.geojson
Fio-filter and fio-map can also be used as checks on the number of vertices and the area of GeoJSON features. Let’s say that you want to keep your areas of interest to 500 vertices or less and no more than 2,000 square kilometers. You can ask fio-map to print true or false or ask fio-filter to screen out features that don’t meet the criteria.
Input:
fio cat rmnp.geojson \
| fio map -r '& (< (vertex_count g) 500) (< (area g) 2000e6)'
Output:
true
Note that the value returned by the builtin area function has units of meters squared. The area of the feature in the rmnp.geojson file is 1,117.4 square kilometers and has 301 vertices.
Creating search geometries on the command line¶
What if you want to do a quick search that isn’t related to any particular GIS dataset? The new fio-map command can help.
All of Shapely’s geometry type constructors are available in fio-planet expressions and can be used to create new shapes directly on the command line. Remember that fio-map’s --raw/-r
option specifies that the outputs of the command will not be wrapped in GeoJSON feature objects, but returned in their natural, raw form. Another fio-map option, --no-input/-n
, specifies that the given expression will be mapped over the sequence [None]
, as with jq --null-input/-n
, producing a single output. Together, these options let fio-map produce new GeoJSON data that is not based on any existing data. On the command line you can combine the options as -nr
or -rn
. If it helps, think of -rn
as “right now”.
Input:
fio map -rn '(Point -105.615 40.255)'
Output:
{"type": "Point", "coordinates": [-105.615, 40.255]}
This value can be saved to a file and used with the planet-data-filter program.
fio map -rn '(Point -105.615 40.255)' > search.geojson
planet data filter --geom search.geojson > filter.json
planet data search PSScene --filter filter.json
Or it can be used inline to make a search without saving any intermediate files at all using POSIX command substitution with the $(...)
syntax, although nested substitution takes a heavy toll on the readability of commands. Please speak with your tech lead before putting pipelines like the one below into production.
planet data search PSScene --filter="$(planet data filter --geom="$(fio map -rn '(Point -105.615 40.255)')")"
Creating and simplifying GeoJSON features for use with the Planet CLI are two of the applications for fio-planet’s map, filter, and reduce commands. You can surely think of more! Combine them in different ways to create new pipelines suited to your workflows and share your experience with others in Planet’s Developers Community or in the fio-planet discussion forum.
Next Steps¶
Keep up to date with the fio-planet GitHub repository and the latest project documentation. Chat with us about your experiences and issues using Fiona and fio-planet with Planet data at https://community.planet.com/developers-55. Follow us on Twitter @PlanetDevs. Sign up for Wavelengths, the Planet Developer Relations newsletter, to get more information on the tech behind the workflows.
Planet's Role in Sustaining the Python Geospatial Stack
By: Sean Gillies on March 16 2023
Can Planet data or the Planet platform be used with Python? It can, and not by accident. Planet is working to make it so. This post attempts to explain what the Python Geospatial Stack is and Planet’s role in keeping the stack in good shape so that developers continue to get value from it.
Planet’s customers don’t only use Planet products via commercial or open source desktop GIS or partner platforms. Many of you are integrating Planet’s products and services into custom-built GIS systems, which use the open source Python Geospatial Stack. This post attempts to explain what the Python Geospatial Stack is and Planet’s role in keeping the stack in good shape so that developers like you continue to get value from it. Can Planet data or the Planet platform be used with Python? It can, and not by accident. Planet is working to make it so. You can help, too.
What is the Python Geospatial Stack?¶
The Python Geospatial Stack is a set of Python packages that use a smaller set of non-standard system libraries written in C/C++: GEOS, PROJ, and GDAL. GEOS is a library of 2-D computational geometry routines. PROJ is a library for cartographic projections and geodetic transformations. GDAL is a model for computing with raster and vector data and a collection of format drivers to allow data access and translation on your computer or over your networks. Planet engineers, including myself, have contributed significantly to these system libraries. Principal Engineer Frank Warmerdam is the author of GDAL, a longtime maintainer of PROJ version 4, and a major contributor to GEOS.
Shapely, Pyproj, Fiona, Rasterio, GeoPandas, PDAL, and Xarray are the core of the stack. Shapely draws upon GEOS. Shapely is useful to developers for filtering asset catalog search results, comparing catalog item footprints to areas of interest, and much more. Pyproj wraps PROJ. Pyproj is helpful for calculating the projected area of regions that are described in longitude and latitude coordinates, and more. Fiona and Rasterio are based on GDAL and allow developers to read and write vector and raster data. PDAL translates and manipulates point cloud data, GeoPandas and Xarray use Fiona and Rasterio and provide higher levels of abstraction for analysis of column-oriented tabular and gridded scalar data, respectively.
How did the stack come about?¶
Why are so many organizations using Python for GIS work? Isn’t it slow? Only an instructional language? Timing explains a lot. When Bruce Dodson started looking at alternatives to Avenue, the scripting language of Esri's ArcView GIS version 3, Python was ready. Python had a good extension story in 2000, meaning that although Python was relatively slow, it was fairly easy to extend with fast code written in C.
When you compute and dissolve the convex hulls of multiple shapes using Shapely, all of the intensive calculation is done by native code, not Python bytecode. The open source native code (GDAL, GEOS, PROJ, etc) just happened to rise up at the same point in time. Timing really is everything.
To close the loop, the Geospatial Stack has helped make the Python language sticky. Developers chose Python from many options for the availability of solid, feature-rich, decently-documented libraries in a particular domain, and end up staying for all the other nice things about the Python language and community. The reason why Python is the second best language for many domains is that the same discovery happened in Numerical Analysis, Machine Learning, Image Processing, and elsewhere.
How is Planet helping?¶
Software and software communities need care and feeding. Planet is involved through financial sponsorship, project governance, code maintenance, and builds of binary distributions.
In 2021, Planet became an inaugural platinum sponsor of GDAL. Planet is helping to pay for infrastructure costs and for the labor of full-time maintainers for GDAL and affiliated projects like GEOS and PROJ. In addition, two Planet engineers (Frank Warmerdam and I) serve on GDAL’s project steering committee. The impact of this sponsorship is huge. GDAL has a better and faster build system, which makes it easier to contribute to and which reduces the cost of every bug fix and new feature. Most importantly, the sponsorship makes it possible for GDAL’s longtime maintainers to avoid burnout, stay involved, and mentor their eventual successors.
Planet’s Developer Relations team is mainly involved at the Python level. I’m the release manager for Fiona and Rasterio. I make sure that binary distributions (aka “wheels”) for Linux, MacOS, and Windows are uploaded to Python’s Package Index so that when developers run “pip install rasterio” almost everybody gets pre-compiled Python packages with GDAL, GEOS, and PROJ batteries included. Packages that “just work” for many cases on laptops and in hosted notebooks.
A major new version of Shapely was released at the end of 2022 through collaboration with core GeoPandas developers. Shapely 2.0 adds vectorized operations that radically speed up GeoPandas and keep this piece of the stack relevant as projects like GeoParquet start to change the nature of vector data.
Fiona 1.9.0 was released on Jan 20, 2023 and also provides a boost to GeoPandas. Planet’s Developer Relations team is currently building new tools that integrate with this new version of the Geospatial Stack’s vector data package. They will be announced soon.
Multiple teams at Planet helped make Rasterio 1.3.0, Shapely 2.0.0, and Fiona 1.9.0 successful releases by testing beta releases of these packages. Planet is financially sustaining the open source projects that form the foundation of the stack. The DevRel team is engaged with the open source communities that write the stack’s code. At Planet we consider the health of the Python Geospatial Stack to be a big factor in the overall experience of our platform.
Next Steps¶
Are you using Fiona, Rasterio, or Shapely? Upgrade to the latest versions and try them out. Chat with us about any ideas or issues with using these packages with Planet data at https://community.planet.com/developers-55. Follow us on twitter @planetdevs.
Simplifying Your Complex Area of Interest: a Planet Developers Deep Dive
By: Sean Gillies on December 15 2022
While your area of study may be very complex, a simplified representation of it works best with Planet’s platform.
Introduction¶
Planet’s APIs use spatial information on both ends, input and output. You can filter data catalog search results by comparing items to an area of interest. You can point a SkySat at the region on Earth that you are studying. You can order products derived from these sources and have them clipped to spatial boundaries that you provide. Planet Data API search results are represented using GeoJSON features. This allows the footprints of, e.g., PlanetScope scenes, to be viewed on a map along with your area of interest.
The shapes of these areas of interest, regions, and boundaries could be simple as triangles and squares or could be less perfect and more intricate agricultural fields or watersheds. The complexity of these shapes can adversely impact your experience as much as their area does, adding time to your analysis . But don’t worry—this post will help you understand how to analyze and treat that complexity so that you can get the most out of Planet data.
What do we mean by complexity?¶
By and large, the shapes of features on Earth’s surface are represented in Planet APIs by GeoJSON geometry objects. A triangular shape on the ground, for example, is represented by a JSON object with two members. The first member is a coordinate array consisting of three unique points (pairs of numbers) plus the first point, repeated to explicitly close the coordinate sequence. The second member is type, which has the value Polygon.
{
"coordinates": [
[
[
-1.859709,
15.634462
],
[
-3.933013,
12.424525
],
[
-0.423789,
12.315333
],
[
-1.859709,
15.634462
]
]
],
"type": "Polygon"
}
By complexity of a shape, we mean the number of edge segments or the number of coordinate pairs that are the boundaries of the segments. A square polygon has five such coordinate pairs, which we will refer to as vertices. One coordinate pair is a vertex. More realistic polygons might have 100 vertices or more. A shape with more vertices is more complex than a shape with fewer. Imagine a copy of the triangle above, but with one additional vertex added midway along one side, collinear with the original points on that side. This new polygon has the same area as the original triangle and is more or less equivalent to the original as far as common algorithms are concerned. Its domain (internal connected region) is the same as the triangle’s domain. But it is more complex due to the extra vertex. Complexity increases with each additional vertex, and complexity comes with a cost.
Even more about complexity¶
Why did we say that the complexity of shapes adversely impacts your experience? The algorithm we use to implement Orders API clipping involves reducing your clipping shape into a set of triangles. The time it takes to compute this triangulation is on the order of n * log(n), where n is the number of vertices of the clipping shape. In other words, it takes about 15 times longer to clip using a 100-vertex shape compared to 10-vertex shape. And 13 times yet longer to scale from 100 to 1000 vertices. This is a fact of life for geographic information systems: even our best general-purpose geometry algorithms struggle with complex shapes.
Computing the bounding box of your shape—as is done in Data API search and Tasking API orders—takes time on the order of n, where n is again the number of vertices. Complexity affects these operations almost as much as it affects clipping. It takes about 10 times longer to compare a 100-vertex shape to Planet’s catalog compared to a 10-vertex shape.
The greatest impact on your experience, however, is due to the cost of copying and passing long arrays of coordinates throughout Planet’s systems. Data transfer, whether from disk to memory, or across a network, is orders of magnitude slower than operations in a computer’s CPU. Shape data volume increases with the number of shape vertices: this is a fact of life. The more complex your shapes, the slower your search, order, and subscription results.
What about the area of shapes? Is this property not significant? Planet’s catalog is spatially indexed. The time to find results increases only slightly with the area of your shape. The area of a shape relative to the ground resolution of Planet product pixels affects search response time mainly by yielding more results that must be serialized and sent back to you. The area of your shapes thus affects your experience, but it is a different problem from the one posed by shape complexity.
Planet APIs constrain the complexity of your shapes so that you don’t have the experience of being in computational limbo. Data volume is not directly correlated to the area of your shapes and so area is not explicitly limited. Data API search requests are limited to 1MB in size, which limits the number of vertices of search shapes in these requests. Orders and Subscriptions API clip shape vertices are limited.
However, the Orders or Subscriptions API clip tool is most valuable whenever you have areas of interest that are small compared to the size of Planet imagery scenes. Use it, because it reduces the time to apply other tools like bandmath and harmonization, and reduces the amount of data you need to download. The important thing is to manage the complexity of shapes that you provide in requests, using only as much as needed to get good results.
Managing complexity¶
Now that we’ve defined complexity and its impact, let’s consider how to manage complexity. Our goal with managing complexity is to create a simpler shape that covers the entire original area while increasing total area as little as possible.
Counting vertices¶
The first step is measuring the complexity of your area of interest. Specifically, this means counting the number of vertices. ArcGIS and QGIS allow this through their expression engines as !shape!.pointcount
and num_points($geometry)
, respectively. QGIS also exposes this as a derived attribute of all features. In PostGIS, you can use the function ST_NPoints
.
In a Python program you can count vertices using the Shapely library. Here is an example that uses recursion.
def vertex_count(obj) -> int:
""""Count the vertices of a GeoJSON-like geometry object.
Parameters
----------
obj: a GeoJSON-like mapping or an object that provides __geo_interface__
Returns
-------
int
"""
shp = shape(obj)
if hasattr(shp, "geoms"):
return sum(vertex_count(part) for part in shp.geoms)
elif hasattr(shp, "exterior"):
return vertex_count(shp.exterior) + sum(vertex_count(rng) for rng in shp.interiors)
else:
return len(shp.coords)
If your shape is within the vertex limit, you can use it as an Orders or Subscriptions API clip geometry with no modifications.
If your shape has fewer than about 20,000 vertices, you can use it for a Data API search with no modifications. This is because the Data API strictly limits search requests to 1MB of JSON text. You can assume each vertex, or coordinate pair, is represented in the GeoJSON by 40-46 bytes of text with each coordinate printed to 15 decimal places of precision (the default of OGR and QGIS). Given the 1MB limit and the amount of text required per vertex, this sets a rough upper limit of 20,000 vertices.
Reducing complexity¶
The recipe we suggest for reducing complexity is this: Dissolve internal boundaries in your area of interest. If that doesn’t bring your area of interest under the limit, compute the convex hull. *Buffer to compensate for concavities or use the latest in outer concavity algorithms. If your area of interest has too many vertices, the first thing to check is whether redundant interior vertices can be eliminated by dissolving parts of your area of interest into a larger whole. For example, your area of interest might be a set of adjacent districts, watersheds, or zones extracted from other imagery. These shapes may share edges and vertices along those edges. Consider this map of Rocky Mountain National Park’s (RMNP) wilderness patrol zones. These 25 zones share boundaries and have redundant vertices.
This area is interesting in that some of its boundaries follow natural features of the landscape and some are purely matters of real estate. It is several PlanetScope scenes high and about one scene wide. The 25 zones cover 1081 km² and are described by a total of 28,915 vertices. The vertices are spaced about 10-20 meters apart along the zone boundaries.
This area of interest is not suited for search or clipping in this form. It is too complex. QGIS’s dissolve geoprocessing tool can eliminate 24,302 of these vertices along the interior boundaries, yielding a single shape with 5613 vertices. ArcGIS also has a dissolve tool. If you’re managing areas of interest with PostGIS, you can use ST_UnaryUnion
. In Python, shapely provides unary_union
. All of these are capable of dissolving areas of interest before Planet searches, orders, or subscriptions are made.
An 80% reduction in the number of vertices without losing any area is a step in the right direction. You could search with this new shape, it is well under the 1 MB size limit, but 5613 is still beyond the vertex limits for other Planet APIs. Fortunately, we have more cards to play. We can use simpler approximations of our areas of interest.
Convex hull¶
The smallest convex set that contains a shape is called its convex hull. The boundary of the convex hull of your area of interest looks like an elastic band stretched around it. It is an excellent approximation for mostly convex shapes as it hugs (for example,headlands), but less excellent for shapes with concavities as it fills in (for example, bays).
A useful property of the convex hull is that it covers every point in your area of interest. When it comes to searching and clipping, this means that you will find every catalog item that you would have found with the shape of your complex area of interest and will get every bit of data in your order or subscription that you would get if you could use your exact area of interest. A convex hull is radically less complex than the shape it surrounds. In this example, it is only 43 vertices. 99.9% of the original complexity has been removed. Planet API requests using this simplified shape will be speedily processed.
On the other hand, a convex hull is necessarily larger in area than your area of interest. In the present example, we have a +18% increase in area. That translates to a small chance of false positive search results: finding a Planet scene which intersects a filled-in “bay,” but not any of the original zones. It also means you’ll download data in those same filled-in regions that you may not use. In this example, 191 km² more.
Convex hull, the sequel¶
It turns out that one way to get a better approximation of your area of interest is to use even more convex hulls. If you take the convex hull of each zone separately and then dissolve those 25 convex hulls, you get a shape that hugs the original boundary more tightly. 99.4% of the complexity has been removed while only increasing the area by 5.7% (62 km²). This union of hulls is a great fit for Data API searching and Orders and Subscriptions API clipping.
Simplified buffers¶
If your area of interest is dominated by concavities, the convex hull approach may not be useful at all. For example, the convex hull of an N or Z-shaped region is a full rectangle and maximizes the number of false positive search results and extra clipped data volume. In such cases, we’ve often recommended the approach of buffering and then simplifying areas of interest because the outcome can follow the outline of the original concave shape better than a convex hull can.
The difficulty with buffering and simplifying is that there are more parameters to adjust compared to the convex hull approximation. Success can vary greatly depending on these parameters and on the complexity of your area of interest.
By simplification, we mean removal of vertices from the boundary of a shape, leaving the most important ones. The Visvalingam–Whyatt algorithm (ref TODO) uses local area-based criteria for importance. The widely used Douglas-Peucker algorithm has global distance-based criteria.
Imagine the original vertices of the boundary of your area of interest and the line segments between them, and another smaller set of vertices and line segments which approximates that boundary. The Hausdorff distance is the distance that we would need to dilate each of these sets so that they mutually cover each other. Given a distance like this, the Douglas-Peucker algorithm finds a simplified set of vertices that has an equal Hausdorff distance. Both algorithms result in loss of boundary detail, erosion of convex corners, and filling of concave corners. Erosion of convex corners has an impact on Data API search and Orders and Subscriptions API clipping: searches may miss items in the catalog and pixels may be missing from delivered data. Thus, simplification must practically be preceded by buffering.
The shape buffering operations offered by ArcGIS, QGIS, PostGIS, and Shapely have parameters setting the number of buffer segments around shape corners and the distance of these segments from the original shape. The effect of buffering on complexity is somewhat complicated. A buffer operation is constructive and creates a new shape with an entirely new set of vertices. More vertices are necessarily created around corners of the original area of interest. If you buffer a point and specify five segments per quarter circle around the point (the QGIS default, it’s eight for PostGIS), you get a 20-sided polygon (an icosagon) with 20 vertices. For buffer distances that are large compared to the distance between vertices, you’ll observe that buffering reduces complexity. In the extreme, a buffer shape approaches that which you would get when buffering a point.
Such a large buffer is useless in practice. You’ll want to use buffer distances smaller than the size of the area of interest so as not to unnecessarily inflate its area. However, for buffer distances that are the same order of magnitude as the distance between original vertices, or smaller, you’ll observe that buffering tends to increase complexity. On convex corners, 1-10 extra vertices may be created. Inversely, some vertices may be removed at concave corners. For the relatively small buffer distances that we are likely to use, the net result is to increase complexity. Buffering the RMNP zones by a distance of 100 meters and using five segments per quarter-circle gives a shape which has 598 more vertices (+11%).
If you subsequently perform a Douglas-Peucker simplification (available from ArcGIS, PostGIS, QGIS, Python, and other software) with a tolerance less than or equal to the buffer distance, extra corner vertices will be removed and some non-corner vertices will also be removed. Using a tolerance greater than the buffer distance is likely to result in loss of some of your area of interest due to convex corner erosion.
The best practice for buffering, simplifying, and not creating too many new vertices is this: use a buffer distance larger than the distance between vertices of your original shape. simplify using a tolerance that is no larger than the buffer distance.
Outer concave hulls¶
The future of approximating the shapes of areas of interest which have both convexities and concavities is the outer concave hull. Version 3.11 of the GEOS library provides an outer concave hull algorithm. This feature is thus coming soon to QGIS, PostGIS, and Shapely 2.0.0.
Below is a short script which uses an unreleased version of Shapely (2.0rc1) to construct the concave hulls of Rocky Mountain National Park’s Wilderness Patrol Zones, dissolve them, and write the results to a GeoJSON file which can be opened and displayed in QGIS.
import json
import fiona
from shapely.geometry import mapping, shape
from shapely import concave_hull, unary_union
with fiona.open("Wilderness_Patrol_Zones.shp") as collection:
hulls = [concave_hull(shape(feat["geometry"]), ratio=0.4) for feat in collection]
dissolved_hulls = unary_union(hulls)
feat = dict(type="Feature", properties={}, geometry=mapping(dissolved_hulls))
with open(
"dissolved-concave-hulls.geojson",
"w",
) as f:
collection = json.dump({"type": "FeatureCollection", "features": [feat]}, f)
This concave hull hugs the area of interest more closely than the convex hull and still has the important property of an outer hull: every point in the original area of interest is covered.
Conclusion¶
When your area of interest is overly complex, there are tools available to handle the complexity and maximize the value of Planet's platform. Desktop GIS software and programming libraries can measure the complexity of your areas of interest and provide tools to help you simplify your shapes to meet the vertex limit of Planet APIs: convex hulls, simplified buffers, and concave hulls. Today, convex hulls and simplified buffers are the most accessible methods. In the future, concave hulls may make the less predictable simplified buffer approach obsolete.
Next Steps¶
Check out our new simplification helpers in Planet Explorer. Let us know of any interesting simplification workflows you’ve done by Tweeting us @PlanetDevs. Sign up for Wavelengths, the Planet Developer Relations newsletter, to get more information on the tech behind the workflows.