Observability
Kipper includes a built-in observability stack for production monitoring: Loki for logs, Prometheus for metrics, and Grafana for dashboards.
All three are installed automatically during kip install and configured to work together out of the box.
Accessing Grafana
Grafana is available at https://grafana-<your-domain>:
https://grafana-46-225-91-12.kipper.runDefault credentials:
- Username: admin
- Password: kipper
Change the password after first login.
What's included
Loki: Log aggregation
Loki collects logs from all pods across all namespaces. Unlike streaming logs from a single pod (kip app logs), Loki gives you:
- Persistent logs: survive pod restarts and crashes
- Searchable: filter by app, namespace, time range, or text content
- Multi-pod: see logs from all replicas of an app in one view
In Grafana, go to Explore → select Loki as the data source → query with LogQL:
{namespace="yourr-name-test", app="domain-service"}Filter for errors:
{namespace="yourr-name-test", app="domain-service"} |= "ERROR"Prometheus: Metrics
Prometheus collects CPU, memory, network, and request metrics from all pods and nodes. Pre-configured with:
- Node exporter: CPU, memory, disk, network per node
- kube-state-metrics: pod status, deployment health, replica counts
- Pod metrics: CPU and memory usage per container
In Grafana, go to Explore → select Prometheus as the data source → query with PromQL:
container_memory_usage_bytes{namespace="yourr-name-test"}Grafana: Dashboards
Grafana comes with pre-built dashboards for cluster monitoring. Access them from the sidebar → Dashboards.
Useful built-in dashboards:
- Kubernetes / Compute Resources / Namespace: CPU and memory per namespace
- Kubernetes / Compute Resources / Pod: CPU and memory per pod
- Node Exporter Full: detailed node health
AI log analysis
The log viewers in the web console (for apps, functions, and jobs) include an Analyse button. Click it to send the currently visible logs to the configured AI provider for analysis.
The AI scans the log output for errors, warnings, stack traces, and unusual patterns. It returns a summary of what happened, highlights the most likely root cause, and suggests next steps. This is especially useful when debugging unfamiliar stack traces or sifting through high-volume log output where the signal is buried in noise.
AI log analysis works with both live streaming logs and Loki history queries. The analysis uses whatever logs are currently displayed. Use the time range and search filters to narrow the context before clicking Analyse.
Requires an AI provider to be configured in the Settings page. See Configuration: AI provider settings for setup.
Disabling monitoring
On smaller servers (8-12 GB RAM), the monitoring stack can be disabled to free approximately 1-2 GB of memory for your applications. Logs from the web console (live streaming via kip app logs and the Console log viewer) continue to work. Only persistent log storage and metrics collection are affected.
Monitoring lives in the platform layer. The same kip platform commands that manage Prometheus and Loki memory limits also toggle them on and off. See Platform Resources for the full picture.
Disable
kip platform disable prometheus
kip platform disable lokiThe platform reconciler in console-api picks the change up and deletes the underlying HelmCharts; helm-controller then uninstalls the releases.
Re-enable
kip platform enable prometheus
kip platform enable lokiThe HelmCharts are re-created from the same templates the installer uses, with the active profile's memory values.
Check status
kip platform status Platform profile: medium
8-16 GB host. Real workloads, sensible defaults.
prometheus on limit 1Gi
loki on limit 512MiWhen a component is disabled, kip status shows it as "disabled" rather than unhealthy.
Legacy kip monitoring
kip monitoring enable/disable/status still works as a thin compatibility wrapper around kip platform. New scripts should call kip platform directly; the old form will be removed in a future release.
Resource usage
Prometheus and Loki memory scale with the platform sizing profile that kip install picks based on node memory. The other observability components (Grafana, Promtail, kube-state-metrics, node-exporter) have small, near-flat footprints across all profiles.
Per-profile memory limits (request is typically half the limit):
| Profile | Prometheus limit | Loki limit |
|---|---|---|
nano | disabled | disabled |
small | 512 Mi | 384 Mi |
medium | 1 Gi | 512 Mi |
large | 1 Gi | 512 Mi |
xlarge | 2 Gi | 1 Gi |
If Prometheus or Loki hits its limit and gets OOMKilled, the platform reconciler auto-bumps the limit (up to per-component ceilings of 4 Gi for Prometheus and 2 Gi for Loki) so a workload that outgrew the profile default doesn't fail silently. You can also override the limit manually via the console's Platform page or kip platform resize. See Platform Resources for the full reference.
Grafana sits at 64 Mi request / 128 Mi limit across all profiles. Promtail is 32 Mi / 128 Mi. kube-state-metrics and node-exporter each sit at 32 Mi / 64 Mi.
Data retention
- Metrics (Prometheus): 3 days
- Logs (Loki): 3 days
For longer retention, update the Helm values via the k3s HelmChart resource in kube-system.
Architecture
All components run in the monitoring namespace and are managed by Helm charts via k3s.