Skip to content

Resource Management

Kipper automatically manages CPU and memory for your apps so you do not have to think about Kubernetes resource requests and limits. It monitors actual usage and adjusts allocations to match. It scales up when apps need more and scales down when they are over-provisioned.

Auto mode (default)

A background controller monitors resource usage via metrics-server every 60 seconds. When it detects sustained high or low usage, it adjusts CPU and memory requests and limits automatically.

How it works

ConditionThresholdAction
High usageAbove 80% for 3 consecutive checksIncrease by 50%
Low usageBelow 20% for 3 consecutive checksHalve (with minimums)
OOM killImmediateDouble memory (capped at 2 Gi)
Stuck podIn ContainerCreating for 5+ minutesDelete pod to trigger recreation

The controller only acts when usage is consistently high or low. A single spike does not trigger a scale-up, and a brief idle period does not trigger a scale-down. That way, temporary load changes don't cause thrashing.

Profile-based minimums

The controller never scales below the resource profile defaults. This prevents databases and heavy applications from being starved:

ProfileMin CPU requestMin CPU limitMin memory
lightweight50m50m64 Mi
standard100m100m128 Mi
database250m250m256 Mi
jvm100m1000m (burstable)2 Gi

The jvm profile is burstable on purpose: the request stays low so pods schedule on small nodes, but the limit is high so cold-start JIT compilation can use a full core for a few minutes without that capacity being permanently reserved. JVM apps spend most of their time idle and only need that headroom during startup.

Database services (PostgreSQL, MySQL, MongoDB, OpenSearch) automatically get the database profile.

OOM memory cap

OOM doubling is capped at 50% of total node allocatable memory (minimum 8 Gi). On a 16 GB node, the cap is 8 Gi. If an OOM-killed pod is already at the cap, the controller creates a critical alert instead of doubling further.

All values are rounded to clean boundaries: CPU to the nearest 50m, memory to the nearest 64 Mi. If rounding would produce the same value as the current setting, the controller skips the update.

Startup grace period

Pods younger than 5 minutes are excluded from CPU and memory calculations. Without this grace period, the controller would react to transient startup spikes. JVM applications, for example, often use 100% CPU during class loading and JIT compilation for several minutes before settling to idle. OOM detection is unaffected and works immediately regardless of pod age.

Saturation override

The grace period protects against transient startup noise, but a pod that is pinned at its CPU limit is not transient. The cgroup is the bottleneck. When any pod sits at 95% or more of its CPU limit, the controller bypasses both the startup grace and the 3-tick hysteresis and bumps CPU immediately.

This catches a specific failure mode: a JVM app whose CPU limit is too low to ever finish JIT compilation. Without the override, the pod would sit at 100% forever and the grace period would keep classifying it as "still starting up". With the override, the controller raises the limit on the next tick, the JIT can finish, and the pod settles to idle.

The override only triggers an increase, never a decrease. The hysteresis still applies to scale-downs.

Single-replica apps

For apps with a single replica, the controller only scales up and never scales down. Every resource change triggers a pod restart, and with one replica that means a brief outage. Scaling down is only safe with 2+ replicas, where Kubernetes performs a rolling update and at least one pod stays up.

The Scale tab in the web console shows a message explaining this when an app has one replica and auto mode is active.

Autoscaling (HPA)

Autoscaling adjusts the number of pods based on CPU and memory utilisation. It works independently from the resource controller, which adjusts CPU and memory per pod. Together, they give you both vertical scaling (right-sized pods) and horizontal scaling (right number of pods).

How the two controllers interact

ConcernWho owns itWhat it does
CPU and memory per podResource controller (auto mode)Monitors usage, adjusts requests and limits
Number of pods (replicas)HPA (Kubernetes built-in)Monitors utilisation %, scales between min and max
Deployment shape (image, env, volumes)App reconcilerSyncs the Deployment to match the App CR

When autoscaling is enabled, the App reconciler stops writing spec.replicas to the Deployment and lets the HPA own that field. When autoscaling is disabled, the App reconciler owns replicas again.

When to use what

The resource controller and autoscaling solve different problems. They complement each other, but you don't always need both.

Resource management only (auto mode, no autoscaling)

Best for apps with predictable traffic where you don't know the right CPU and memory values yet. Kipper figures out the right size over time. A small internal tool, a staging environment, a service that handles a steady number of background jobs. You don't need multiple replicas, you just need the pod to be the right size.

Autoscaling only (expert mode with HPA)

Best when you know exactly how much CPU and memory each pod needs, but traffic varies. A public API that gets 10 requests per second at night and 500 during business hours. You've profiled the app and set the resources yourself. You just need Kubernetes to add and remove pods as load changes.

Both together

Best for production apps where traffic varies AND you want Kipper to handle the right-sizing automatically. The resource controller finds the right CPU and memory per pod over time. The HPA handles traffic spikes by adding pods quickly, without any restarts. When a traffic spike hits, the HPA responds in seconds by adding pods. The resource controller only adjusts resources after sustained changes over minutes.

Here's a typical sequence with both enabled:

  1. App starts with standard profile defaults (250m CPU, 256Mi memory)
  2. Resource controller watches usage over a few minutes and adjusts. Maybe the app actually needs 500m CPU. That triggers one rolling restart, but the HPA ensures 2+ pods, so there's no downtime.
  3. A traffic spike hits. CPU goes above 70% across all pods.
  4. The HPA adds pods within seconds. No restarts, just more pods handling requests.
  5. The resource controller still watches per-pod usage. While the HPA has scaled out, the controller will not decrease per-pod resources (more pods doesn't justify shrinking each one), but it can still increase CPU or memory if pods are saturated. This handles the case where horizontal scaling alone is not enough, for example JVM apps stuck at a too-low per-pod CPU ceiling.
  6. Traffic drops. The HPA removes the extra pods.
  7. If baseline usage is still higher than before, the resource controller will eventually adjust. But only after sustained readings, not from a temporary spike.

Common scenarios

Your situationRecommended setup
Small internal tool, one userAuto mode only, 1 replica
Staging environment, testingAuto mode only, 1 replica
Production API, steady trafficAuto mode, 2 replicas (no autoscaling)
Production API, variable trafficAuto mode + autoscaling, min 2 / max 5
JVM app you've already tunedExpert mode + autoscaling
Database or cacheAuto mode only (databases should not be horizontally scaled)
Batch worker, periodic spikesAuto mode + autoscaling based on CPU

Enabling autoscaling

From the Scale tab in the web console, toggle Autoscaling on. Set the minimum and maximum replicas and a CPU target percentage. Click Save autoscaling.

From the CLI or GitOps:

yaml
apiVersion: kipper.run/v1alpha1
kind: App
metadata:
  name: api
  namespace: yourr-name-prod
spec:
  image: registry.example.com/api:latest
  port: 8080
  autoscale:
    enabled: true
    minReplicas: 2
    maxReplicas: 5
    cpuTarget: 70

The HPA checks metrics every 15 seconds. When average CPU across all pods exceeds the target, it adds pods. When utilisation drops, it removes pods (down to minReplicas).

Set minReplicas to at least 2 when using auto mode. This gives two benefits:

  1. The resource controller can safely scale resources down. With 2+ replicas, Kubernetes performs a rolling update so at least one pod stays available during the restart
  2. Your app has basic high availability. If one pod crashes, the other continues serving traffic

A good starting point for most apps:

SettingValue
Min replicas2
Max replicas5
CPU target70%
Memory target0 (disabled)

Memory-based autoscaling is usually less useful because most applications do not release memory when load drops. CPU-based scaling responds faster to actual load changes.

What happens under the hood

  1. You enable autoscaling on the App CR
  2. The App reconciler creates an HPA targeting the app's Deployment
  3. The HPA reads CPU metrics from metrics-server and adjusts deployment.spec.replicas
  4. The resource controller independently adjusts CPU and memory requests based on per-pod usage
  5. When the App reconciler runs (e.g. after an image update), it updates the Deployment template but preserves the replica count set by the HPA

Disabling autoscaling

Toggle autoscaling off in the Scale tab and click Save autoscaling. The HPA is deleted and the App reconciler takes over replica management again, setting replicas to app.Spec.Replicas (defaults to 1).

OOM recovery

When a pod is terminated by the kernel for exceeding its memory limit (OOMKilled), the controller doubles the memory immediately, without waiting for 3 consecutive checks. This handles cases where an app needs significantly more memory than its initial allocation, such as a Java application starting with 64 Mi but requiring 512 Mi+ for the JVM.

The controller detects OOM kills even when the pod is in a crash loop and has no metrics. It checks the pod's termination state directly from the Kubernetes API, not just from metrics data.

Resource profiles

When an app has no resource requests configured, the controller applies defaults based on the app's resource profile label (kipper.run/resource-profile):

ProfileCPU requestCPU limitMemoryUse case
lightweight50m50m64 MiStatic sites, proxies, lightweight APIs
standard100m100m128 MiTypical web applications (default)
compute-heavy500m500m256 MiImage processing, data transformation
memory-heavy100m100m512 MiCaching layers, in-memory databases, ML inference
database250m250m256 MiPostgreSQL, MySQL, MongoDB, OpenSearch
jvm100m1000m2 GiJava/JVM applications, Spring Boot, heavy runtimes

The jvm profile is the only burstable profile by default. Most workloads run with request equal to limit (Guaranteed QoS) so they get exactly what they ask for. JVMs are different: they spike during startup and idle the rest of the time, so the request stays low (so the pod schedules) but the limit is high (so JIT can finish). On the same node, you can run six JVM apps reserving 600m total but each one able to burst to a full core during cold start.

If no profile label is set, standard is used. Database services automatically get the database profile.

Custom resources

For workloads that don't fit any profile (like a Java application with -Xms 4G or a data pipeline needing 8 Gi), you can set explicit CPU and memory values at deploy time.

From the CLI:

bash
kip app deploy --name exchange-service --image registry.example.com/exchange:latest \
  --port 8080 --memory 4Gi --cpu 1

The CLI's --memory and --cpu flags set request and limit to the same value (Guaranteed QoS). If you need burstable CPU (a different request and limit), set them in the web console or by editing the App CR directly.

From the web console:

Select Custom... from the resource profile dropdown when deploying an app. Two fields appear for memory and CPU. Use Kubernetes resource notation: 256Mi, 1Gi, 4Gi for memory; 250m, 500m, 1, 2 for CPU.

For an existing app, open the Settings tab and click Advanced (request & limit) in the resource limits panel. Four fields appear: CPU request, CPU limit, memory request, memory limit. Set request lower than limit for burstable workloads. The form opens in advanced mode automatically when an app already has different request and limit values.

Custom values override the profile defaults. The auto controller still adjusts from there based on actual usage. Your values are the starting point, not a ceiling.

Resource log

Every change the controller makes is logged and visible under Settings in the web console. The log shows:

  • Time: when the change happened
  • App and namespace: which workload was adjusted
  • Action: what changed (increased memory, decreased CPU, applied defaults)
  • From / To: old and new values
  • Reason: why the change was made (usage at 92%, OOM kill detected)

The system retains the most recent 50 log entries.

Expert mode

Switch to expert mode when you want full control over resource allocation. The auto controller stops making changes, and all CPU and memory values are set manually through the Resources tab in the app detail panel.

Toggle between modes in Settings in the web console. Only admins can change the mode.

PUT /api/v1/settings/mode
{"mode": "auto"}    // or "expert"

In expert mode, you can still view the resource log to see what the controller changed before you switched.

Alerts

Every action the controller takes generates an alert visible in the console bell icon:

  • Critical (red): OOM kills, emergency memory doubling
  • Warning (yellow): resource increases, stuck pod recovery
  • Info (green): scale-downs, default profile application

See Alerts for details on the alerting system and Slack integration.

Slack notifications

Resource changes can be forwarded to Slack. See Configuration for setup.

What Kipper manages

The auto controller manages resources for Kipper workloads defined as Custom Resources (kipper.run/v1alpha1):

  • Apps: web apps, APIs, frontends
  • Services: databases, caches, message queues
  • Functions: serverless workloads (resources set at creation, not auto-tuned while idle)
  • Jobs: scheduled and one-off batch tasks

It does not manage system components (Traefik, cert-manager, Longhorn) or the KEDA autoscaler itself.

GitOps

Kipper resources are defined as Custom Resource Definitions (CRDs) under kipper.run/v1alpha1. This means you can manage your entire cluster declaratively with tools like ArgoCD or Flux:

yaml
apiVersion: kipper.run/v1alpha1
kind: App
metadata:
  name: api
  namespace: yourr-name-test
spec:
  image: registry.example.com/api:v2.1.0
  port: 8080
  replicas: 2
  resources:
    profile: jvm
    memoryRequest: "4Gi"
    memoryLimit: "4Gi"
    cpuRequest: "100m"
    cpuLimit: "1500m"
  env:
    LOG_LEVEL: "info"
  route:
    host: api.example.com

Apply with kubectl apply -f app.yaml or commit to a Git repo and let your GitOps tool sync it. Kipper's reconcilers ensure the underlying Kubernetes resources (Deployment, Service, Ingress, Secrets) match the CR spec.

Available CRDs: App, Service, Function, Project, Job, Volume.

For a more user-friendly approach, use the kipper.yaml manifest format with kip apply. See the full GitOps guide for details, including ArgoCD and Flux integration examples.

Released under the Apache 2.0 License.