API Rate Limiting Best Practices

Omar Santos · ‎04-20-2023

Limiting the number of requests to a REST API has several benefits, both for the API provider and its users. Implementing rate limiting and encouraging API clients to avoid excessive querying can help ensure a more stable, performant, and secure service.

The Cisco PSIRT OpenVuln API has the following limits:

RATE LIMITS
5	Calls per second
30	Calls per minute

Here are some of the benefits and best practices for API clients:

Benefits of limiting requests to a REST API:

Improved performance: By limiting the number of requests, you can reduce the risk of server overload, ensuring that the API remains responsive and performant for all users.
Fair usage: Rate limiting helps distribute resources fairly among users, preventing a few heavy users from monopolizing the API and degrading the experience for others.
Security: Limiting the number of requests can protect the API from abuse, such as Distributed Denial of Service (DDoS) attacks, which attempt to overwhelm the server with a flood of requests.
Encouraging efficient use: By enforcing a limit on API calls, you encourage clients to optimize their usage and minimize unnecessary calls, leading to more efficient and better-designed applications.

Best practices for API clients to avoid excessive querying:

Caching: Store the results of API calls locally when possible to reduce the need for repeated requests, especially if the data is unlikely to change frequently.

requests_cache is a Python library designed to provide a transparent caching mechanism for the popular requests library. It allows you to cache HTTP responses to reduce the number of requests made to an API or web service, which can help improve performance, reduce latency, and lower the chance of hitting API rate limits.

requests_cache offers various caching backends like in-memory, SQLite, and others, and it provides a simple configuration API to set cache expiration times, filtering rules, and other cache-related settings. It works by intercepting requests library calls, returning cached data when available and fetching new data when needed.

Here are some key features of the requests_cache library:

Transparent caching: The library integrates seamlessly with the requests library, making it easy to add caching to your existing code without significant modifications.
Cache expiration: You can configure the cache to expire after a specified duration, ensuring that your application uses fresh data when needed.
Flexible storage backends: requests_cache supports various storage backends, such as in-memory storage, SQLite, and others like Redis, which allows you to choose the appropriate storage solution for your application.
Customizable cache keys: The library allows you to define custom cache key functions, enabling fine-grained control over cache lookups.
Filtering and cache control: You can configure cache rules to include or exclude specific requests based on URLs, request methods, or response headers.

To use requests_cache, you can install it using pip/pip3:

pip3 install requests-cache

After installing the library, you can use it in your Python script by importing it and calling the install_cache() function:

import requests
import requests_cache

# Set up a cache that expires after 300 seconds (5 minutes)
cache_expire_after = 300
requests_cache.install_cache("api_cache", expire_after=cache_expire_after)

With this setup, any requests made using the requests library will automatically use the cache, and you can access the cache-related properties in the response object, such as response.from_cache.

2. Throttling: Introduce delays between API requests or implement a queue to spread out requests over time, staying within the API's rate limits.

Here's an example in Python that demonstrates throttling using the time.sleep()

import requests
import time

# Number of requests you want to make
num_requests = 5

# Time delay in seconds between requests
throttle_delay = 2

url = "https://apix.cisco.com/security/advisories/v2"

for i in range(num_requests):
    response = requests.get(url)
    
    if response.status_code == 200:
        print(f"Request {i+1}: Successfully fetched data.")
    else:
        print(f"Request {i+1}: Failed with status code {response.status_code}")
    
    # Wait for the specified throttle delay before making the next request
    if i < num_requests - 1:
        time.sleep(throttle_delay)

3. Prioritization: Prioritize requests based on their importance and avoid making non-essential requests when rate limits are close to being reached.

4. Pagination and filtering: Use pagination to request data in smaller chunks and apply filtering to request only the specific data needed, reducing the overall number of API calls.

5. Exponential backoff: In case of rate limit errors or server-side errors, implement an exponential backoff strategy to progressively increase the wait time before retrying the request. This prevents overwhelming the server with retries.

6. Monitor usage: Keep track of your API usage to avoid hitting rate limits unexpectedly. Be aware of any changes to the API's rate limits and adjust your client's usage accordingly.

7. Opt for bulk operations: If the API supports bulk operations (e.g., batch requests), use them to combine multiple requests into a single call, reducing the total number of calls made.

By following these best practices, API clients can minimize their impact on the API service while still enjoying its benefits and functionality. Please share any additional recommendations based on your experience.

What about Using Redis or something similar?

Using Redis to store API results provides several benefits, including:

Improved performance and reduced latency: Redis is an in-memory data store, which means that it can serve data much faster compared to traditional disk-based databases. By caching API results in Redis, you can significantly reduce the time it takes to access the data and improve your application's overall performance.
Reduced API calls: By storing API results in Redis, your application can retrieve the data from the cache instead of making repeated requests to the API. This can help you avoid hitting rate limits imposed by the API provider and reduce the load on their servers.
Increased reliability: If the API experiences downtime or connectivity issues, your application can still access cached data from Redis, ensuring that your application remains functional even when the API is unavailable.
Scalability: Redis is highly scalable and can handle a large number of concurrent read and write operations. By using Redis to cache API results, you can easily scale your application to handle more users and requests without putting additional load on the API.
Flexibility: Redis supports a variety of data structures, such as strings, lists, sets, and hashes. This flexibility allows you to store and manipulate API results in various ways, depending on your application's requirements.
Expiration and cache management: Redis provides built-in support for setting expiration times on keys. This feature makes it easy to implement cache invalidation strategies, ensuring that your application always uses fresh data when needed.
Real-time processing: Redis can be used as a message broker or for real-time stream processing. By storing API results in Redis, you can enable real-time processing and analysis of the data, which can be useful for applications that require real-time insights or decision-making.

In summary, using Redis to store API results can significantly improve the performance, reliability, and scalability of your application while reducing the load on the API servers.

There are several other caching solutions and in-memory data stores that you can use to cache API results. Some popular alternatives to Redis include:

Memcached: Memcached is a distributed, in-memory key-value store designed for caching. It is simple, fast, and widely used in many applications for caching purposes. However, it offers fewer data structures and features compared to Redis.
In-memory cache with Python: You can implement an in-memory cache in Python using dictionaries or other data structures to store API results. However, this approach is limited by the available memory of the machine running the application and lacks features like persistence, expiration, and distribution.
Local file cache: You can store API results in local files on the filesystem. This approach offers persistence and simplicity, but it is slower than in-memory caches and may not be suitable for high-performance applications.
SQLite: SQLite is a lightweight, serverless, and self-contained SQL database engine. You can use SQLite as a caching solution by storing API results in a local database file. SQLite offers more advanced querying capabilities than key-value stores but may have lower performance for simple caching scenarios.

Each of these caching solutions has its own set of advantages, limitations, and use cases. The best option for your application depends on factors like the required features, performance, scalability, ease of use, and integration with your existing infrastructure.

Once again, please share your recommendations here based on your experience. Hope this helps and thank you in advance for sharing your experiences and knowledge.

PR Oxman · ‎05-10-2023

Thanks Omar.