Resource Management

Kipper automatically manages CPU and memory for your apps so you do not have to think about Kubernetes resource requests and limits. It monitors actual usage and adjusts allocations to match. It scales up when apps need more and scales down when they are over-provisioned.

Auto mode (default)

A background controller monitors resource usage via metrics-server every 60 seconds. When it detects sustained high or low usage, it adjusts CPU and memory requests and limits automatically.

How it works

Condition	Threshold	Action
High usage	Above 80% for 3 consecutive checks	Increase by 50%
Low usage	Below 20% for 3 consecutive checks	Halve (with minimums)
OOM kill	Immediate	Double memory (capped at 2 Gi)
Stuck pod	In `ContainerCreating` for 5+ minutes	Delete pod to trigger recreation

The controller only acts when usage is consistently high or low. A single spike does not trigger a scale-up, and a brief idle period does not trigger a scale-down. That way, temporary load changes don't cause thrashing.

Profile-based minimums

The controller never scales below the resource profile defaults. This prevents databases and heavy applications from being starved:

Profile	Min CPU request	Min CPU limit	Min memory
`lightweight`	50m	50m	64 Mi
`standard`	100m	100m	128 Mi
`database`	250m	250m	256 Mi
`jvm`	100m	1000m (burstable)	2 Gi

The jvm profile is burstable on purpose: the request stays low so pods schedule on small nodes, but the limit is high so cold-start JIT compilation can use a full core for a few minutes without that capacity being permanently reserved. JVM apps spend most of their time idle and only need that headroom during startup.

Database services (PostgreSQL, MySQL, MongoDB, OpenSearch) automatically get the database profile.

OOM memory cap

OOM doubling is capped at 50% of total node allocatable memory (minimum 8 Gi). On a 16 GB node, the cap is 8 Gi. If an OOM-killed pod is already at the cap, the controller creates a critical alert instead of doubling further.

All values are rounded to clean boundaries: CPU to the nearest 50m, memory to the nearest 64 Mi. If rounding would produce the same value as the current setting, the controller skips the update.

Startup grace period

Pods younger than 5 minutes are excluded from CPU and memory calculations. Without this grace period, the controller would react to transient startup spikes. JVM applications, for example, often use 100% CPU during class loading and JIT compilation for several minutes before settling to idle. OOM detection is unaffected and works immediately regardless of pod age.

Saturation override

The grace period protects against transient startup noise, but a pod that is pinned at its CPU limit is not transient. The cgroup is the bottleneck. When any pod sits at 95% or more of its CPU limit, the controller bypasses both the startup grace and the 3-tick hysteresis and bumps CPU immediately.

This catches a specific failure mode: a JVM app whose CPU limit is too low to ever finish JIT compilation. Without the override, the pod would sit at 100% forever and the grace period would keep classifying it as "still starting up". With the override, the controller raises the limit on the next tick, the JIT can finish, and the pod settles to idle.

The override only triggers an increase, never a decrease. The hysteresis still applies to scale-downs.

Single-replica apps

For apps with a single replica, the controller only scales up and never scales down. Every resource change triggers a pod restart, and with one replica that means a brief outage. Scaling down is only safe with 2+ replicas, where Kubernetes performs a rolling update and at least one pod stays up.

The Scale tab in the web console shows a message explaining this when an app has one replica and auto mode is active.

Autoscaling (HPA)

Autoscaling adjusts the number of pods based on CPU and memory utilisation. It works independently from the resource controller, which adjusts CPU and memory per pod. Together, they give you both vertical scaling (right-sized pods) and horizontal scaling (right number of pods).

How the two controllers interact

Concern	Who owns it	What it does
CPU and memory per pod	Resource controller (auto mode)	Monitors usage, adjusts requests and limits
Number of pods (replicas)	HPA (Kubernetes built-in)	Monitors utilisation %, scales between min and max
Deployment shape (image, env, volumes)	App reconciler	Syncs the Deployment to match the App CR

When autoscaling is enabled, the App reconciler stops writing spec.replicas to the Deployment and lets the HPA own that field. When autoscaling is disabled, the App reconciler owns replicas again.

When to use what

The resource controller and autoscaling solve different problems. They complement each other, but you don't always need both.

Resource management only (auto mode, no autoscaling)

Best for apps with predictable traffic where you don't know the right CPU and memory values yet. Kipper figures out the right size over time. A small internal tool, a staging environment, a service that handles a steady number of background jobs. You don't need multiple replicas, you just need the pod to be the right size.

Autoscaling only (expert mode with HPA)

Best when you know exactly how much CPU and memory each pod needs, but traffic varies. A public API that gets 10 requests per second at night and 500 during business hours. You've profiled the app and set the resources yourself. You just need Kubernetes to add and remove pods as load changes.

Both together

Best for production apps where traffic varies AND you want Kipper to handle the right-sizing automatically. The resource controller finds the right CPU and memory per pod over time. The HPA handles traffic spikes by adding pods quickly, without any restarts. When a traffic spike hits, the HPA responds in seconds by adding pods. The resource controller only adjusts resources after sustained changes over minutes.

Here's a typical sequence with both enabled:

App starts with standard profile defaults (250m CPU, 256Mi memory)
Resource controller watches usage over a few minutes and adjusts. Maybe the app actually needs 500m CPU. That triggers one rolling restart, but the HPA ensures 2+ pods, so there's no downtime.
A traffic spike hits. CPU goes above 70% across all pods.
The HPA adds pods within seconds. No restarts, just more pods handling requests.
The resource controller still watches per-pod usage. While the HPA has scaled out, the controller will not decrease per-pod resources (more pods doesn't justify shrinking each one), but it can still increase CPU or memory if pods are saturated. This handles the case where horizontal scaling alone is not enough, for example JVM apps stuck at a too-low per-pod CPU ceiling.
Traffic drops. The HPA removes the extra pods.
If baseline usage is still higher than before, the resource controller will eventually adjust. But only after sustained readings, not from a temporary spike.

Common scenarios

Your situation	Recommended setup
Small internal tool, one user	Auto mode only, 1 replica
Staging environment, testing	Auto mode only, 1 replica
Production API, steady traffic	Auto mode, 2 replicas (no autoscaling)
Production API, variable traffic	Auto mode + autoscaling, min 2 / max 5
JVM app you've already tuned	Expert mode + autoscaling
Database or cache	Auto mode only (databases should not be horizontally scaled)
Batch worker, periodic spikes	Auto mode + autoscaling based on CPU

Enabling autoscaling

From the Scale tab in the web console, toggle Autoscaling on. Set the minimum and maximum replicas and a CPU target percentage. Click Save autoscaling.

From the CLI or GitOps:

yaml

apiVersion: kipper.run/v1alpha1
kind: App
metadata:
  name: api
  namespace: yourr-name-prod
spec:
  image: registry.example.com/api:latest
  port: 8080
  autoscale:
    enabled: true
    minReplicas: 2
    maxReplicas: 5
    cpuTarget: 70

The HPA checks metrics every 15 seconds. When average CPU across all pods exceeds the target, it adds pods. When utilisation drops, it removes pods (down to minReplicas).

Recommended settings

Set minReplicas to at least 2 when using auto mode. This gives two benefits:

The resource controller can safely scale resources down. With 2+ replicas, Kubernetes performs a rolling update so at least one pod stays available during the restart
Your app has basic high availability. If one pod crashes, the other continues serving traffic

A good starting point for most apps:

Setting	Value
Min replicas	2
Max replicas	5
CPU target	70%
Memory target	0 (disabled)

Memory-based autoscaling is usually less useful because most applications do not release memory when load drops. CPU-based scaling responds faster to actual load changes.

What happens under the hood

You enable autoscaling on the App CR
The App reconciler creates an HPA targeting the app's Deployment
The HPA reads CPU metrics from metrics-server and adjusts deployment.spec.replicas
The resource controller independently adjusts CPU and memory requests based on per-pod usage
When the App reconciler runs (e.g. after an image update), it updates the Deployment template but preserves the replica count set by the HPA

Disabling autoscaling

Toggle autoscaling off in the Scale tab and click Save autoscaling. The HPA is deleted and the App reconciler takes over replica management again, setting replicas to app.Spec.Replicas (defaults to 1).

OOM recovery

When a pod is terminated by the kernel for exceeding its memory limit (OOMKilled), the controller doubles the memory immediately, without waiting for 3 consecutive checks. This handles cases where an app needs significantly more memory than its initial allocation, such as a Java application starting with 64 Mi but requiring 512 Mi+ for the JVM.

The controller detects OOM kills even when the pod is in a crash loop and has no metrics. It checks the pod's termination state directly from the Kubernetes API, not just from metrics data.

Resource profiles

When an app has no resource requests configured, the controller applies defaults based on the app's resource profile label (kipper.run/resource-profile):

Profile	CPU request	CPU limit	Memory	Use case
`lightweight`	50m	50m	64 Mi	Static sites, proxies, lightweight APIs
`standard`	100m	100m	128 Mi	Typical web applications (default)
`compute-heavy`	500m	500m	256 Mi	Image processing, data transformation
`memory-heavy`	100m	100m	512 Mi	Caching layers, in-memory databases, ML inference
`database`	250m	250m	256 Mi	PostgreSQL, MySQL, MongoDB, OpenSearch
`jvm`	100m	1000m	2 Gi	Java/JVM applications, Spring Boot, heavy runtimes

The jvm profile is the only burstable profile by default. Most workloads run with request equal to limit (Guaranteed QoS) so they get exactly what they ask for. JVMs are different: they spike during startup and idle the rest of the time, so the request stays low (so the pod schedules) but the limit is high (so JIT can finish). On the same node, you can run six JVM apps reserving 600m total but each one able to burst to a full core during cold start.

If no profile label is set, standard is used. Database services automatically get the database profile.

Custom resources

For workloads that don't fit any profile (like a Java application with -Xms 4G or a data pipeline needing 8 Gi), you can set explicit CPU and memory values at deploy time.

From the CLI:

bash

kip app deploy --name exchange-service --image registry.example.com/exchange:latest \
  --port 8080 --memory 4Gi --cpu 1

The CLI's --memory and --cpu flags set request and limit to the same value (Guaranteed QoS). If you need burstable CPU (a different request and limit), set them in the web console or by editing the App CR directly.

From the web console:

Select Custom... from the resource profile dropdown when deploying an app. Two fields appear for memory and CPU. Use Kubernetes resource notation: 256Mi, 1Gi, 4Gi for memory; 250m, 500m, 1, 2 for CPU.

For an existing app, open the Settings tab and click Advanced (request & limit) in the resource limits panel. Four fields appear: CPU request, CPU limit, memory request, memory limit. Set request lower than limit for burstable workloads. The form opens in advanced mode automatically when an app already has different request and limit values.

Custom values override the profile defaults. The auto controller still adjusts from there based on actual usage. Your values are the starting point, not a ceiling.

Resource log

Every change the controller makes is logged and visible under Settings in the web console. The log shows:

Time: when the change happened
App and namespace: which workload was adjusted
Action: what changed (increased memory, decreased CPU, applied defaults)
From / To: old and new values
Reason: why the change was made (usage at 92%, OOM kill detected)

The system retains the most recent 50 log entries.

Expert mode

Switch to expert mode when you want full control over resource allocation. The auto controller stops making changes, and all CPU and memory values are set manually through the Resources tab in the app detail panel.

Toggle between modes in Settings in the web console. Only admins can change the mode.

PUT /api/v1/settings/mode
{"mode": "auto"}    // or "expert"

In expert mode, you can still view the resource log to see what the controller changed before you switched.

Alerts

Every action the controller takes generates an alert visible in the console bell icon:

Critical (red): OOM kills, emergency memory doubling
Warning (yellow): resource increases, stuck pod recovery
Info (green): scale-downs, default profile application

See Alerts for details on the alerting system and Slack integration.

Slack notifications

Resource changes can be forwarded to Slack. See Configuration for setup.

What Kipper manages

The auto controller manages resources for Kipper workloads defined as Custom Resources (kipper.run/v1alpha1):

Apps: web apps, APIs, frontends
Services: databases, caches, message queues
Functions: serverless workloads (resources set at creation, not auto-tuned while idle)
Jobs: scheduled and one-off batch tasks

It does not manage system components (Traefik, cert-manager, Longhorn) or the KEDA autoscaler itself.

GitOps

Kipper resources are defined as Custom Resource Definitions (CRDs) under kipper.run/v1alpha1. This means you can manage your entire cluster declaratively with tools like ArgoCD or Flux:

yaml

apiVersion: kipper.run/v1alpha1
kind: App
metadata:
  name: api
  namespace: yourr-name-test
spec:
  image: registry.example.com/api:v2.1.0
  port: 8080
  replicas: 2
  resources:
    profile: jvm
    memoryRequest: "4Gi"
    memoryLimit: "4Gi"
    cpuRequest: "100m"
    cpuLimit: "1500m"
  env:
    LOG_LEVEL: "info"
  route:
    host: api.example.com

Apply with kubectl apply -f app.yaml or commit to a Git repo and let your GitOps tool sync it. Kipper's reconcilers ensure the underlying Kubernetes resources (Deployment, Service, Ingress, Secrets) match the CR spec.

Available CRDs: App, Service, Function, Project, Job, Volume.

For a more user-friendly approach, use the kipper.yaml manifest format with kip apply. See the full GitOps guide for details, including ArgoCD and Flux integration examples.

Resource Management ​

Auto mode (default) ​

How it works ​

Profile-based minimums ​

OOM memory cap ​

Startup grace period ​

Saturation override ​

Single-replica apps ​

Autoscaling (HPA) ​

How the two controllers interact ​

When to use what ​

Enabling autoscaling ​

Recommended settings ​

What happens under the hood ​

Disabling autoscaling ​

OOM recovery ​

Resource profiles ​

Custom resources ​

Resource log ​

Expert mode ​

Alerts ​

Slack notifications ​

What Kipper manages ​

GitOps ​