Best Practices for working with Large AOIs

Overview¶

In this guide, you'll learn more about best practices for working with a large amount of image items in the Data API.

If you haven't already, see the "Getting Started" guide for more information on how to acquire an API key and setup your development environment.

Paginate Through a Large Search¶

When a search returns more then 250 results those results will be broken up into several pages. You must request each page seperately, below is a common pagination pattern.

examples/pagination.py

import os
import requests

session = requests.Session()
session.auth = (os.environ['PLANET_API_KEY'], '')

# this large search filter produces all PlanetScope imagery for 1 day
very_large_search = {
  "name": "very_large_search",
  "item_types": ["PSScene"],
  "filter": {
      "type": "DateRangeFilter",
      "field_name": "acquired",
      "config": {
        "gte": "2016-07-01T00:00:00.000Z",
        "lte": "2016-07-02T00:00:00.000Z"
      }
    }
}

# Create a Saved Search
saved_search = \
    session.post(
        'https://api.planet.com/data/v1/searches/',
        json=very_large_search)

# after you create a search, save the id. This is what is needed
# to execute the search.
saved_search_id = saved_search.json()["id"]

# What we want to do with each page of search results
# in this case, just print out each id
def handle_page(page):
    for item in page["features"]:
        print item["id"]

# How to Paginate:
# 1) Request a page of search results
# 2) do something with the page of results
# 3) if there is more data, recurse and call this method on the next page.
def fetch_page(search_url):
    page = session.get(search_url).json()
    handle_page(page)
    next_url = page["_links"].get("_next")
    if next_url:
        fetch_page(next_url)

first_page = \
    ("https://api.planet.com/data/v1/searches/{}" +
        "/results?_page_size={}").format(saved_search_id, 5)

# kick off the pagination
fetch_page(first_page)

➜ python examples/pagination.py
219842_2261109_2016-07-01_0c2b
219842_1962716_2016-07-01_0c2b
219842_1962713_2016-07-01_0c2b
219842_1962613_2016-07-01_0c2b
219842_1962714_2016-07-01_0c2b
...

Parallelizing Requests¶

If you need to perform a large number of API operations, such as activating many items, your request rate can be improved by sending requests in parallel.

In Python this can be done using a ThreadPool. The size of the ThreadPool will define the parallelism of requests.

Due to Python's architecture ThreadPools are most useful when doing I/O bound operations, like talking to an API. If the operations were CPU bound, this would not be a good idea See: GIL.

Note: If you increase the request rate too much you will run into rate limiting, see the next section for how to deal with that.

examples/parallelism.py

import os
import requests
from multiprocessing.dummy import Pool as ThreadPool

# setup auth
session = requests.Session()
session.auth = (os.environ['PLANET_API_KEY'], '')

def activate_item(item_id):
    print "activating: " + item_id

    # request an item
    item = session.get(
        ("https://api.planet.com/data/v1/item-types/" +
        "{}/items/{}/assets/").format("PSScene", item_id))

    # request activation
    session.post(
        item.json()["visual"]["_links"]["activate"])

# An easy way to parallise I/O bound operations in Python
# is to use a ThreadPool.
parallelism = 5
thread_pool = ThreadPool(parallelism)

# Open up a file of ids that we have already retrieved from a search
with open('examples/1000_PSScene_ids.txt') as f:
    item_ids = f.read().splitlines()[:100] # only grab 100

# In this example, all items will be sent to the `activate_item` function
# but only 5 will be running at once
thread_pool.map(activate_item, item_ids)

➜ python examples/parallelism.py
activating: 219842_2261109_2016-07-01_0c2b
 activating: 217416_1962722_2016-07-01_0c2b
activating: 219842_1962614_2016-07-01_0c2b
activating: 217416_1761305_2016-07-01_0c2b
 activating: 217416_1660809_2016-07-01_0c2b
 ...

Working with Rate Limiting¶

If you handle them correctly, rate limiting errors will be a normal and useful part of working with the API.

The Planet API responds with HTTP 429 when your request has been denied due to exceeding rate limits.

The following example shows you how to identify a rate limit error and then retry with an exponential backoff. An exponential backoff means that you wait for exponentially longer intervals between each retry of a single failing request.

The retrying library provides a decorator that you can add to any method to give it various types of retries.

examples/rate_limiting.py

import os
import requests
from multiprocessing.dummy import Pool as ThreadPool
from retrying import retry

# setup auth
session = requests.Session()
session.auth = (os.environ['PLANET_API_KEY'], '')

# "Wait 2^x * 1000 milliseconds between each retry, up to 10
# seconds, then 10 seconds afterwards"
@retry(
    wait_exponential_multiplier=1000,
    wait_exponential_max=10000)
def activate_item(item_id):
    print "attempting to activate: " + item_id

    # request an item
    item = session.get(
        ("https://api.planet.com/data/v1/item-types/" +
        "{}/items/{}/assets/").format("PSScene", item_id))

     # raise an exception to trigger the retry
    if item.status_code == 429:
        raise Exception("rate limit error")

    # request activation
    result = session.post(
        item.json()["visual"]["_links"]["activate"])

    if result.status_code == 429:
        raise Exception("rate limit error")

    print "activation succeeded for item " + item_id

parallelism = 50

thread_pool = ThreadPool(parallelism)

with open('examples/1000_PSScene_ids.txt') as f:
    item_ids = f.read().splitlines()[:400] # only grab 100

thread_pool.map(activate_item, item_ids)

➜ python examples/rate_limiting.py
attempting to activate: 217416_1661222_2016-07-01_0c2b
attempting to activate: 217416_1962721_2016-07-01_0c2b
attempting to activate: 217416_1661320_2016-07-01_0c2b
attempting to activate: 217416_1458713_2016-07-01_0c2b
 attempting to activate: 217416_1761305_2016-07-01_0c2b
activation succeeded for item 217416_1761714_2016-07-01_0c2b
 activation succeeded for item 217416_2062808_2016-07-01_0c2b
 attempting to activate: 217416_1761712_2016-07-01_0c2b
attempting to activate: 217416_1962820_2016-07-01_0c2b
activation succeeded for item 217416_1153521_2016-07-01_0c2b

Rate this guide:

Best Practices for working with Large AOIs

last updated: March 18, 2024

Overview¶

Paginate Through a Large Search¶

Parallelizing Requests¶

Working with Rate Limiting¶