Platform Implementation
Platform Implementation describes how things work rather than how to use them. This section of the docs is for Platform Engineers wanting to contribute or understand how things work under the covers.
Platform Implementation describes how things work rather than how to use them. This section of the docs is for Platform Engineers wanting to contribute or understand how things work under the covers.
Managed alertmanager is a single replica statefulset deployed with Google Managed Prometheus. It receives alerts from the rule evaluator and sends notification to configured receivers.
kubectl -n gmp-system get sts alertmanager
kubectl -n gmp-system get deploy rule-evaluator
Alerts are defined using Rules, ClusterRules or GlobalRules.
Rules spec follows the same format as a prometheus rules files, which makes it possible to test using promtool To view alert rules, run
kubectl -n platform-monitoring describe rules
Grafana is installed using the grafana operator which manages the grafana instance, dashboards and datasources using CRDs. CRDs API reference: https://grafana-operator.github.io/grafana-operator/docs/api/
It runs as a deployment:
kubectl -n platform-monitoring get deploy grafana-operator
kubectl -n platform-monitoring get deploy platform-grafana-deployment
Dashboards are automatically synced by the operator.
You can use the grafanadashboard
resources to check its status and when it was last synced.
kubectl -n platform-monitoring get grafanadashboard
NAME NO MATCHING INSTANCES LAST RESYNC AGE
bastion 42m 7h43m
continuous-load 2m17s 7h43m
kubernetes-apiserver 42m 7h43m
[...]
When exporting dashboard json from Grafana, make sure special characters are replaced as follows
{{ target }}
with {{ "{{" }} target {{ "}}" }}
$somevar
with ${somevar}
Infra connector is a module in the reference core platform that allows to create cloud objects using kubernetes resources.
Current implementation for GCP uses the k8s config connector Installing these will make available a variety of CRD that will allows to create different GCP resources without the need to write terraform code. For example:
apiVersion: iam.cnrm.cloud.google.com/v1beta1
kind: IAMServiceAccount
metadata:
name: {{include "account.fullname" .Values.tenant.name }}
annotations:
propagate.hnc.x-k8s.io/none: "true"
spec:
displayName: GCP Service Account for tenant {{ .Values.tenant.name }}
This will create a GCP SA for each the tenant being provisioned
The goal of this modules is to decouple terraform from the platform modules. Having this allows us to create cloud resources with something like helm and doesn’t tie us down to terraform. Meaning we can couple or decouple any other modules a lot easier. This means that this is 1 of the 2 modules in the current implementation that uses terraform, everything else is installed with the help of a script. If they require cloud resources, they will create them using the infra connector CRDs.
Another advantage of using this is that we can allow tenants to create GCP resources like buckets, databases etc that they might need without needing to reaching out to the platform or to a DevOps team, making the more independent. What they can and can’t create will be control with a mix of RBAC and policy controller - A Role that will specify which objects they can create, and the policy controller to ensure what they create is allowed and it won’t impact any other tenant.
The platform uses Google Managed Prometheus which comes with a scalable backend prometheus storage and metrics collectors that scrape exposed metrics endpoints such as kubelet/cadvisor and kube state metrics via CRDs. CRDs are defined here: https://github.com/GoogleCloudPlatform/prometheus-engine/blob/v0.7.4/doc/api.md
The GMP operator runs as a deployment
kubectl -n gmp-system get deploy gmp-operator
Generates metrics from a wide range of Kubernetes objects. These can be used to assess the health of your pods, deployment, jobs and many other Kubernetes objects.
They generally start with kube_
.
It runs as a deployment:
kubectl -n gmp-public get deploy kube-state-metrics
Note that GMP re-labels namespace
to exported_namespace
as it reserves namespace for the namespace of the pod that
the metric is scraped from. When importing dashboards that rely on kube-state-metrics
metrics, the queries must use exported_namespace
.
Collects metrics for containers running on the node ; it runs alongside kubelet on each node. Typical metrics include cpu, memory, I/O usage which can be used to diagnose performance issues.
They generally start with container_
kubelet is the agent running on the node that is responsible to ensure containers are running and healthy. Collected metrics can be used to identify pod start duration, the number of pods and containers on the node and other information about the node, such as status
This is used to probe key endpoints on or outside the platform, so we can monitor uptime and SSL expiry of components with TLS termination. It runs as a deployment:
kubectl -n platform-monitoring get deploy prometheus-blackbox-exporter
Collects hardware and OS-level system metrics exposed on the node. Metrics include host memory, cpu, disk and network. It runs as a daemonset:
kubectl -n gmp-public get ds node-exporter
Tenants are organised via the Hierarchical Namespace Controller
cecg-system
: Internal componentsreference-applications
: Applications to show you how to use the platformThe application teams can create tenancies under root or another top level folder e.g. tenants
❯ kubectl hns tree root
root
├── cecg-system
│ ├── platform-ingress
│ ├── platform-monitoring
│ └── platform-policy
├── reference-applications
│ └── knowledge-platform
│ └── golang
│ ├── [s] golang-functional
│ └── [s] golang-nft
│ └── [s] golang-integration
└── tenants
├── cecg-playground
└── devops-playground
[s] indicates subnamespaces