Prometheus Operator configuration reference for Lokomotive
Introduction
The Prometheus Operator for Kubernetes provides easy monitoring definitions for Kubernetes services and deployment and management of Prometheus instances.
Prerequisites
-
A Lokomotive cluster with a PersistentVolume plugin, e.g. OpenEBS or one of the built-in plugins.
-
An ingress controller such as Contour for HTTP ingress.
-
The cert-manager component is required for enabling HTTPS.
Configuration
Prometheus Operator component configuration example:
component "prometheus-operator" {
namespace = "monitoring"
grafana {
admin_password = "foobar"
secret_env = { # This might contain sensitive information, declare a variable and define this in `lokocfg.vars`.
"KEY" = "VERY_SECRET"
}
ingress {
host = "grafana.mydomain.net"
class = "contour"
certmanager_cluster_issuer = "letsencrypt-production"
}
}
operator {
tolerations {
key = "lokomotive.io/operator"
operator = "Equal"
value = "test"
effect = "NoSchedule"
}
admission_webhook_tolerations {
key = "lokomotive.io/operator-admission-webhook"
operator = "Equal"
value = "test"
effect = "NoSchedule"
}
}
prometheus {
metrics_retention = "14d"
storage_size = "50Gi"
node_selector = {
"kubernetes.io/hostname" = "worker3"
}
ingress {
host = "prometheus.mydomain.net"
class = "contour"
certmanager_cluster_issuer = "letsencrypt-production"
}
watch_labeled_service_monitors = true
watch_labeled_prometheus_rules = true
external_labels = {
"cluster" = var.cluster_name
}
}
alertmanager {
retention = "360h"
external_url = "https://api.example.com/alertmanager"
config = file("alertmanager-config.yaml")
node_selector = {
"kubernetes.io/hostname" = "worker3"
}
tolerations {
key = "lokomotive.io/alertmanager"
operator = "Equal"
value = "test"
effect = "NoSchedule"
}
}
}
Create alertmanager-config.yaml
file if necessary. Visit the
alertmanager
configuration
for more
information.
Note: Make sure the whole file is indented two spaces. That is, there are two spaces before the top level block.
config:
global:
resolve_timeout: 5m
route:
group_by:
- job
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: "null"
routes:
- match:
alertname: Watchdog
receiver: "null"
receivers:
- name: "null"
NOTE: Ensure the file alertmanager_config.yaml
is added to .gitignore
to avoid any accidental exposure
of sensitive data. Alternatively you can store the alertmanager configuration in lokocfg.vars
as below:
#lokocfg.vars
alertmanager_config = <<EOF
config:
global:
resolve_timeout: 5m
route:
group_by:
- job
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'null'
routes:
- match:
alertname: Watchdog
receiver: 'null'
receivers:
- name: 'null'
EOF
Attribute reference
Table of all the arguments accepted by the component.
Argument | Description | Default | Type | Required |
---|---|---|---|---|
namespace |
Namespace to deploy the Prometheus Operator. | monitoring |
string | false |
grafana.admin_password |
Password for admin user in Grafana. If not provided it is auto generated and stored in secret prometheus-operator-grafana . |
- | string | false |
grafana.secret_env |
Sensitive environment variables passed to Grafana pod and stored as secret. Read more on manipulating grafana.ini using env var
here
. |
- | map(string) | false |
grafana.ingress.host |
Ingress URL host to expose Grafana over the internet. NOTE: When running on Equinix Metal, a DNS entry pointing at the ingress controller needs to be created. | - | string | true |
grafana.ingress.class |
Ingress class to use for Grafana ingress. | contour |
string | false |
grafana.ingress.certmanager_cluster_issuer |
ClusterIssuer to be used by cert-manager while issuing TLS certificates. Supported values: letsencrypt-production , letsencrypt-staging . |
letsencrypt-production |
string | false |
operator.node_selector |
Node selector to specify nodes where the Prometheus Operator pods should be deployed. | {} | map(string) | false |
operator.tolerations |
Toleration that prometheus operator will tolerate. | - | list(object({key = string, effect = string, operator = string, value = string, toleration_seconds = string })) | false |
operator.admission_webhook_tolerations |
Toleration that prometheus operator admission webhook patch job will tolerate. | - | list(object({key = string, effect = string, operator = string, value = string, toleration_seconds = string })) | false |
prometheus.metrics_retention |
Time duration Prometheus shall retain data for. Must match the regular expression [0-9]+(ms|s|m|h|d|w|y) (milliseconds, seconds, minutes, hours, days, weeks and years). |
10d |
string | false |
prometheus.node_selector |
Node selector to specify nodes where the Prometheus pods should be deployed. | {} | map(string) | false |
prometheus.tolerations |
Toleration that prometheus pods will tolerate. | - | list(object({key = string, effect = string, operator = string, value = string, toleration_seconds = string })) | false |
prometheus.storage_size |
Storage capacity for the Prometheus in bytes. You can express storage as a fixed-point integer using one of these suffixes: E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. | “50Gi” | string | false |
prometheus.watch_labeled_service_monitors |
By default prometheus operator watches only the ServiceMonitor objects in the cluster that are labeled release: prometheus-operator . If set to false then all the ServiceMonitors will be watched. |
true |
bool | false |
prometheus.watch_labeled_prometheus_rules |
By default prometheus operator watches only the PrometheusRule objects in the cluster that are labeled release: prometheus-operator and app: kube-prometheus-stack . If set to false then all the PrometheusRule will be watched. |
true |
bool | false |
prometheus.external_labels |
This is the Prometheus parameter with the same name. The labels to add to any time series or alerts when communicating with external systems (federation, remote storage, Alertmanager). | - | map(string) | false |
prometheus.ingress.host |
Ingress URL host to expose Prometheus over the internet. NOTE: When running on Equinix Metal, a DNS entry pointing at the ingress controller needs to be created. | - | string | true |
prometheus.ingress.class |
Ingress class to use for Prometheus ingress. | contour |
string | false |
prometheus.ingress.certmanager_cluster_issuer |
ClusterIssuer to be used by cert-manager while issuing TLS certificates. Supported values: letsencrypt-production , letsencrypt-staging . |
letsencrypt-production |
string | false |
prometheus.external_url |
The URL on which Prometheus will be accessible. If not provided, the URL is taken from prometheus.ingress.host with https as a scheme. |
- | string | false |
alertmanager.retention |
Time duration Alertmanager shall retain data for. Must match the regular expression [0-9]+(ms|s|m|h) (milliseconds, seconds, minutes and hours). |
120h |
string | false |
alertmanager.external_url |
The external URL the Alertmanager instances will be available under. This is necessary to generate correct URLs. This is necessary if Alertmanager is not served from root of a DNS name. | "" | string | false |
alertmanager.config |
Provide YAML file path to configure Alertmanager. See https://prometheus.io/docs/alerting/configuration/#configuration-file . | {"global":{"resolve_timeout":"5m"},"route":{"group_by":["job"],"group_wait":"30s","group_interval":"5m","repeat_interval":"12h","receiver":"null","routes":[{"match":{"alertname":"Watchdog"},"receiver":"null"}]},"receivers":[{"name":"null"}]} |
string | false |
alertmanager.node_selector |
Node selector to specify nodes where the AlertManager pods should be deployed. | {} | map(string) | false |
alertmanager.tolerations |
Toleration that AlertManager will tolerate. | - | list(object({key = string, effect = string, operator = string, value = string, toleration_seconds = string })) | false |
alertmanager.storage_size |
Storage capacity for the Alertmanager in bytes. You can express storage as a fixed-point integer using one of these suffixes: E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. | “50Gi” | string | false |
disable_webhooks |
Disables validation and mutation webhooks. This might be required on older versions of Kubernetes to install successfully. | false | bool | false |
monitor |
Block, which allows to disable scraping of individual Kubernetes components. | - | object | false |
monitor.etcd |
Controls if the default Prometheus instance should scrape etcd metrics. | true | bool | false |
monitor.kube_controller_manager |
Controls if the default Prometheus instance should scrape kube-controller-manager metrics. | true | bool | false |
monitor.kube_scheduler |
Controls if the default Prometheus instance should scrape kube-scheduler metrics. | true | bool | false |
monitor.kube_proxy |
Controls if the default Prometheus instance should scrape kube-proxy metrics. | true | bool | false |
monitor.kubelet |
Controls if the default Prometheus instance should scrape kubelet metrics. | true | bool | false |
coredns |
Block, which allows to customize, how CoreDNS is scraped. | - | object | false |
coredns.selector |
Defines, how CoreDNS pods should be selected for scraping. | {“k8s-app”:“coredns”,“tier”:“control-plane”} | map(string) | false |
storage_class |
Storage Class to use for the storage allowed for Prometheus and Alertmanager. | - | string | false |
Applying
To apply the Prometheus Operator component:
lokoctl component apply prometheus-operator
Post-installation
To start monitoring your applications running on Kubernetes. Just create a ServiceMonitor
object
in that namespace which looks like following:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: openebs
name: openebs
namespace: openebs
spec:
endpoints:
- path: /metrics
port: exporter
namespaceSelector:
matchNames:
- openebs
selector:
matchLabels:
openebs.io/cas-type: cstor
Change the labels
, endpoints
, namespaceSelector
, selector
fields as you need. To learn more
about basics of ServiceMonitor
read the docs
here
and
the API Reference can be found
here
.
Deleting
To destroy the component:
lokoctl component delete prometheus-operator --delete-namespace