Skip to main content

Alerts Loki

Optimizing queries

You should always try to optimize queries (alerting expressions) as much as possible. A query normally should specify cluster, namespace and application\pod or a well defined list of applications. Wider queries will work on the larger datasets, which take longer time and is very demanding on resources. Well defined queries are even more crucial in alerting than in usual log searches, as alerting rulers will be evaluated constantly.

Logs vs Metrics

As a rule of thumb, metrics should be used for monitoring applications and logs for debugging purposes. Metrics are preferable for sending alerts because executing queries on a time-series database is more efficient, reliable, and cost-effective compared to querying logs systems. In most cases a preferred alerting solution will be adding custom defined prometheus metrics to your application and creating a Prometheus alerts based on those metrics Custom prometheus metrics.

Loki rules

Loki alerting rule should be created as a PrometheusRule kubernetes resource, just with an additional tag loki-rule: "true". Examples of Loki alerting rules.

Required rule label

You should always add the label namespace to your alert rules. This will ensure that Slack messages are routed to the channel defined for your namespace in the terraform-aks repo.

loki-rule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: "loki-alerting-your-app-name"
namespace: "your-namespace"
labels:
loki-rule: "true" # this label is important for alerts to be send to Loki
app: "your-app-name"
environment: "environment"
spec:
groups:
- name: app-name
rules:
- alert: KnownErrorInApp
expr: 'sum by (error) (count_over_time({cluster_name="your-cluster", namespace="your-namespace", service_name="your-application", pod_container_name="your-application"} | json | level=`error` | pattern `<_> <well-defined-error> pattern-text <_>` [10m])) > 1'
labels:
severity: warning
namespace: "your-app-namespace" # this label is important for sending alerts to correct Slack channels
ruler: loki
annotations:
summary: KnownError appearing in logs