API Priority and Fairness
API Priority and Fairness (APF) in Kubernetes is a sophisticated system designed to manage how the Kubernetes API server handles incoming requests, especially during times of high traffic or overload. Its main goal is to ensure that critical requests get through while preventing the server from becoming overwhelmed or unresponsive.
Table Of Content
Here’s a detailed yet clear explanation of APF’s purpose, mechanisms, and benefits:
Purpose of API Priority and Fairness
The Kubernetes API server can become overloaded when too many requests arrive simultaneously. While there are basic controls like --max-requests-inflight to limit the number of concurrent requests, these controls don’t guarantee that the most important requests are prioritized. APF improves on these by classifying and isolating requests more finely, ensuring fair access and prioritization.
How APF Works
- Classification of Requests (FlowSchemas):
Incoming requests are classified based on attributes such as the user, the resource being accessed, and the type of operation. This classification assigns each request to a specific priority level using FlowSchemas. - Priority Levels (PriorityLevelConfigurations):
Each priority level has its own concurrency limits, meaning it can only handle a certain number of requests at once. This isolation prevents low-priority or misbehaving clients from starving higher-priority requests. For example, leader election requests and built-in controller requests have dedicated priority levels separate from general user requests. - Concurrency Limits and Seats:
The system uses the concept of “seats” to represent units of concurrency. Each request occupies one or more seats depending on its resource intensity. For instance, list requests that return many objects consume more seats because they are heavier on the server. - Fair Queuing and Shuffle Sharding:
Within each priority level, requests are further divided into flows (e.g., by user or namespace). APF uses a fair queuing algorithm to ensure that no single flow can dominate the queue and starve others. Shuffle sharding is used to efficiently assign requests to queues, isolating low-intensity flows from high-intensity ones. - Queuing Instead of Immediate Rejection:
APF introduces limited queuing to handle brief bursts of traffic gracefully. Instead of rejecting excess requests immediately, it queues them, reducing failures during short spikes. - Exemptions:
Some requests, such as those from system administrators or critical system components, are exempt from APF limits to ensure the API server remains responsive. - Dynamic Adjustment:
The concurrency limits for priority levels are periodically adjusted. Under-utilized priority levels can lend capacity to busier ones temporarily, optimizing resource use.
Benefits of API Priority and Fairness
- Improved Stability: Prevents the API server from crashing or becoming unresponsive under heavy load.
- Fair Access: Ensures that no single user or controller can monopolize API server resources.
- Prioritization: Guarantees that critical system operations like leader election and controller actions get priority access.
- Graceful Handling of Bursts: Queuing allows the system to absorb short bursts of traffic without dropping requests.
- Customizability: Cluster administrators can configure priority levels, concurrency shares, and queuing behavior to fit their workloads.
- Observability: Kubernetes exposes detailed metrics about APF performance, helping administrators monitor and tune the system.
Additional Considerations
- Long-Running Requests: Some long-running requests like remote command execution are not subject to APF.
- Recursive Server Calls: Care must be taken in scenarios where servers call each other recursively to avoid deadlocks or priority inversions.
- Health Check Requests: By default, health check requests from kubelets are treated as normal traffic but can be exempted from rate limiting with additional configuration.
In summary, API Priority and Fairness is a powerful mechanism in Kubernetes that smartly manages API server traffic by classifying requests, enforcing concurrency limits, and ensuring fair and prioritized access. This helps maintain cluster stability and responsiveness even under heavy or bursty workloads.


