Skip to content

Observability

Kipper includes a built-in observability stack for production monitoring: Loki for logs, Prometheus for metrics, and Grafana for dashboards.

All three are installed automatically during kip install and configured to work together out of the box.

Accessing Grafana

Grafana is available at https://grafana-<your-domain>:

https://grafana-46-225-91-12.kipper.run

Default credentials:

  • Username: admin
  • Password: kipper

Change the password after first login.

What's included

Loki: Log aggregation

Loki collects logs from all pods across all namespaces. Unlike streaming logs from a single pod (kip app logs), Loki gives you:

  • Persistent logs: survive pod restarts and crashes
  • Searchable: filter by app, namespace, time range, or text content
  • Multi-pod: see logs from all replicas of an app in one view

In Grafana, go to Explore → select Loki as the data source → query with LogQL:

{namespace="yourr-name-test", app="domain-service"}

Filter for errors:

{namespace="yourr-name-test", app="domain-service"} |= "ERROR"

Prometheus: Metrics

Prometheus collects CPU, memory, network, and request metrics from all pods and nodes. Pre-configured with:

  • Node exporter: CPU, memory, disk, network per node
  • kube-state-metrics: pod status, deployment health, replica counts
  • Pod metrics: CPU and memory usage per container

In Grafana, go to Explore → select Prometheus as the data source → query with PromQL:

container_memory_usage_bytes{namespace="yourr-name-test"}

Grafana: Dashboards

Grafana comes with pre-built dashboards for cluster monitoring. Access them from the sidebar → Dashboards.

Useful built-in dashboards:

  • Kubernetes / Compute Resources / Namespace: CPU and memory per namespace
  • Kubernetes / Compute Resources / Pod: CPU and memory per pod
  • Node Exporter Full: detailed node health

AI log analysis

The log viewers in the web console (for apps, functions, and jobs) include an Analyse button. Click it to send the currently visible logs to the configured AI provider for analysis.

The AI scans the log output for errors, warnings, stack traces, and unusual patterns. It returns a summary of what happened, highlights the most likely root cause, and suggests next steps. This is especially useful when debugging unfamiliar stack traces or sifting through high-volume log output where the signal is buried in noise.

AI log analysis works with both live streaming logs and Loki history queries. The analysis uses whatever logs are currently displayed. Use the time range and search filters to narrow the context before clicking Analyse.

Requires an AI provider to be configured in the Settings page. See Configuration: AI provider settings for setup.

Disabling monitoring

On smaller servers (8-12 GB RAM), the monitoring stack can be disabled to free approximately 1-2 GB of memory for your applications. Logs from the web console (live streaming via kip app logs and the Console log viewer) continue to work. Only persistent log storage and metrics collection are affected.

Monitoring lives in the platform layer. The same kip platform commands that manage Prometheus and Loki memory limits also toggle them on and off. See Platform Resources for the full picture.

Disable

bash
kip platform disable prometheus
kip platform disable loki

The platform reconciler in console-api picks the change up and deletes the underlying HelmCharts; helm-controller then uninstalls the releases.

Re-enable

bash
kip platform enable prometheus
kip platform enable loki

The HelmCharts are re-created from the same templates the installer uses, with the active profile's memory values.

Check status

bash
kip platform status
  Platform profile: medium
  8-16 GB host. Real workloads, sensible defaults.

    prometheus   on        limit 1Gi
    loki         on        limit 512Mi

When a component is disabled, kip status shows it as "disabled" rather than unhealthy.

Legacy kip monitoring

kip monitoring enable/disable/status still works as a thin compatibility wrapper around kip platform. New scripts should call kip platform directly; the old form will be removed in a future release.

Resource usage

Prometheus and Loki memory scale with the platform sizing profile that kip install picks based on node memory. The other observability components (Grafana, Promtail, kube-state-metrics, node-exporter) have small, near-flat footprints across all profiles.

Per-profile memory limits (request is typically half the limit):

ProfilePrometheus limitLoki limit
nanodisableddisabled
small512 Mi384 Mi
medium1 Gi512 Mi
large1 Gi512 Mi
xlarge2 Gi1 Gi

If Prometheus or Loki hits its limit and gets OOMKilled, the platform reconciler auto-bumps the limit (up to per-component ceilings of 4 Gi for Prometheus and 2 Gi for Loki) so a workload that outgrew the profile default doesn't fail silently. You can also override the limit manually via the console's Platform page or kip platform resize. See Platform Resources for the full reference.

Grafana sits at 64 Mi request / 128 Mi limit across all profiles. Promtail is 32 Mi / 128 Mi. kube-state-metrics and node-exporter each sit at 32 Mi / 64 Mi.

Data retention

  • Metrics (Prometheus): 3 days
  • Logs (Loki): 3 days

For longer retention, update the Helm values via the k3s HelmChart resource in kube-system.

Architecture

All components run in the monitoring namespace and are managed by Helm charts via k3s.

Released under the Apache 2.0 License.