Platform Implementation

Platform Implementation describes how things work rather than how to use them. This section of the docs is for Platform Engineers wanting to contribute or understand how things work under the covers.

Alerting

Alert manager

Managed alertmanager is a single replica statefulset deployed with Google Managed Prometheus. It receives alerts from the rule evaluator and sends notification to configured receivers.

kubectl -n gmp-system get sts alertmanager 
kubectl -n gmp-system get deploy rule-evaluator

Alerts definitions

Alerts are defined using Rules, ClusterRules or GlobalRules.

Rules spec follows the same format as a prometheus rules files, which makes it possible to test using promtool To view alert rules, run

kubectl -n platform-monitoring describe rules

Grafana

Grafana is installed using the grafana operator which manages the grafana instance, dashboards and datasources using CRDs. CRDs API reference: https://grafana-operator.github.io/grafana-operator/docs/api/

It runs as a deployment:

kubectl -n platform-monitoring get deploy grafana-operator
kubectl -n platform-monitoring get deploy platform-grafana-deployment

Dashboards

Dashboards are automatically synced by the operator. You can use the grafanadashboard resources to check its status and when it was last synced.

kubectl -n platform-monitoring get grafanadashboard
NAME                    NO MATCHING INSTANCES   LAST RESYNC   AGE
bastion                                         42m           7h43m
continuous-load                                 2m17s         7h43m
kubernetes-apiserver                            42m           7h43m
[...]

When exporting dashboard json from Grafana, make sure special characters are replaced as follows

replace {{ target }} with {{ "{{" }} target {{ "}}" }}
replace $somevar with ${somevar}

Datasources

Prometheus: points to the prometheus frontend to access all dashboard metrics
Alertmanager: points to the managed alertmanager to manage silences, view firing alerts, contact points, and notification policies

Infra Connector

Infra connector is a module in the reference core platform that allows to create cloud objects using kubernetes resources.

Current implementation for GCP uses the k8s config connector Installing these will make available a variety of CRD that will allows to create different GCP resources without the need to write terraform code. For example:

apiVersion: iam.cnrm.cloud.google.com/v1beta1
kind: IAMServiceAccount
metadata:
  name: {{include "account.fullname" .Values.tenant.name }}
  annotations:
    propagate.hnc.x-k8s.io/none: "true"
spec:
  displayName: GCP Service Account for tenant {{ .Values.tenant.name }}

This will create a GCP SA for each the tenant being provisioned

Current usage

The goal of this modules is to decouple terraform from the platform modules. Having this allows us to create cloud resources with something like helm and doesn’t tie us down to terraform. Meaning we can couple or decouple any other modules a lot easier. This means that this is 1 of the 2 modules in the current implementation that uses terraform, everything else is installed with the help of a script. If they require cloud resources, they will create them using the infra connector CRDs.

Future usage

Another advantage of using this is that we can allow tenants to create GCP resources like buckets, databases etc that they might need without needing to reaching out to the platform or to a DevOps team, making the more independent. What they can and can’t create will be control with a mix of RBAC and policy controller - A Role that will specify which objects they can create, and the policy controller to ensure what they create is allowed and it won’t impact any other tenant.

Metrics collection

The platform uses Google Managed Prometheus which comes with a scalable backend prometheus storage and metrics collectors that scrape exposed metrics endpoints such as kubelet/cadvisor and kube state metrics via CRDs. CRDs are defined here: https://github.com/GoogleCloudPlatform/prometheus-engine/blob/v0.7.4/doc/api.md

The GMP operator runs as a deployment

kubectl -n gmp-system get deploy gmp-operator

Kube state metrics - docs

Generates metrics from a wide range of Kubernetes objects. These can be used to assess the health of your pods, deployment, jobs and many other Kubernetes objects.

They generally start with kube_.

It runs as a deployment:

kubectl -n gmp-public get deploy kube-state-metrics

Note that GMP re-labels namespace to exported_namespace as it reserves namespace for the namespace of the pod that the metric is scraped from. When importing dashboards that rely on kube-state-metrics metrics, the queries must use exported_namespace.

cadvisor - docs

Collects metrics for containers running on the node ; it runs alongside kubelet on each node. Typical metrics include cpu, memory, I/O usage which can be used to diagnose performance issues.

They generally start with container_

kubelet - docs

kubelet is the agent running on the node that is responsible to ensure containers are running and healthy. Collected metrics can be used to identify pod start duration, the number of pods and containers on the node and other information about the node, such as status

Blackbox exporter - docs

This is used to probe key endpoints on or outside the platform, so we can monitor uptime and SSL expiry of components with TLS termination. It runs as a deployment:

kubectl -n platform-monitoring get deploy prometheus-blackbox-exporter

Node exporter - docs

Collects hardware and OS-level system metrics exposed on the node. Metrics include host memory, cpu, disk and network. It runs as a daemonset:

kubectl -n gmp-public get ds node-exporter

Tenancy

Tenants are organised via the Hierarchical Namespace Controller

cecg-system: Internal components
reference-applications: Applications to show you how to use the platform

The application teams can create tenancies under root or another top level folder e.g. tenants

❯ kubectl hns tree root
root
├── cecg-system
│   ├── platform-ingress
│   ├── platform-monitoring
│   └── platform-policy
├── reference-applications
│   └── knowledge-platform
│   └── golang
│       ├── [s] golang-functional
│       └── [s] golang-nft
│       └── [s] golang-integration
└── tenants
    ├── cecg-playground
    └── devops-playground

[s] indicates subnamespaces

Platform Implementation

Subsections of Platform Implementation

Alerting

Alerting

Alert manager

Alerts definitions

Grafana

Grafana

Dashboards

Datasources

Infra Connector

Infra Connector

Current usage

Future usage

Metrics collection

Metrics collection

Kube state metrics - docs

cadvisor - docs

kubelet - docs

Blackbox exporter - docs

Node exporter - docs

Tenancy

Tenancy