The Importance of Using Labels in Kubernetes
Even a small Kubernets cluster may have hundreds of Containers, Pods, Services and many other Kubernetes API ojects. It quickly becomes annoying to page through pages of
kubectl
output to find your object. -labels
address this issue perfectly. The primary reasons you should use labels can be:
- enables you to logically organize all your Kubernetes workloads in all your clusters.
- enables you to very selectively filter
kubectö
outputs to just the objects you need. - enables you to understand the layers and hierarchies of all your API objects-
Labels vs. Annotations
Labels and annotations are sometimes confused. Having a quick look at the documentation makes this understandable.
Labels
"metadata": {
"labels": {
"key1" : "value1",
"key2" : "value2"
}
}
Annotations
"metadata": {
"annotations": {
"key1" : "value1",
"key2" : "value2"
}
}
Labels are key/value pairs that are attached to objects, such as pods. Labels are intended to be used specify identifying attributes of objects that are meaningful and relevant to users, but do not directly imply semantics to the core system. Labels can be used to organize and to select subsets of obkects.
You can use Kubernetes annotations to attach arbitrary non-identiying metadata to objects. Clients such as tools and libraries can retrieve this metadata. You can use either labels or annotataions to attach metadata to Kubernetes objects. Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects. The metadata in an annotation can be small or large, structured or unstructured.
Example labels:
"release" : "stable"
"release" : "canary"
"environment" : "dev"
"environment" : "qa"
"environment" : "production"
Example annotations:
standbyphone: 000-000 0000
developer: Neil Armstrong
Let's now focus more deeply labels and how to use them.
Well-Known Labels, Annotations and Taints
Before we create our own labels, let's look at some labels that Kubernetes creates automatically. Kubernetes automatically creates these labels on nodes.
kubernetes.io/arch
Example:
kubernetes.io/arch=amd64
kubernetes.io/os
Example:
kubernetes.io/os=linux
node.kubernetes.io/instance-type
Example:
node.kubernetes.io/instance-type=m3.medium
topology.kubernetes.io/zone
Example 1:
topology.kubernetes.io/region=us-east-1
Example 2:
topology.kubernetes.io/zone=us-east-1c
These labels now allow us to filter our nodes in the following interesting ways
List All Linux Nodes
$ kubectl get nodes -l 'kubernetes.io/os=linux'
List all nodes with instance type m3.medium
$ kubectl get nodes -l 'node.kubernetes.io/instance-type=m3.medium'
List all nodes in a specific region
$ kubectl get nodes -l 'topology.kubernetes.io/region=us-east-1'
List all nodes in specific regions
$ kubectl get nodes -l 'topology.kubernetes.io/region in (us-east-1, us-west-1)'
If we apply these labels on all our Pods we may filter the kubectl output as follows
"release" : "stable"
"release" : "canary"
"environment" : "dev"
"environment" : "qa"
"environment" : "production"
$ kubectl get pods -l 'environment in (production), release in (canary)'
$ kubectl get pods -l 'environment in (production, qa)'
$ kubectl get pods -l 'environment notin (qa)'
Considering a given complex environment of multiple Kubernetes clusters, multiple nodes and many more namespaces, it's easy to see the ability to filter kubectl
output is a major timesaver. In addition, Job, Deployment, ReplicaSet and DaemonSet, support set-based selectors as well.
selector:
matchLabels:
component: redis
matchExpressions:
- {key: tier, operator: In, values: [cache]}
- {key: environment, operator: NotIn, values: [dev]}
Organize All your Kubernetes Workloads in All your Clusters.
By taking the AWS terminology as the base, we can create an example labeling schema. First some definitions:
- Region: A physical location around the world
- Availability Zone: A group of data centers inside a region.
This means that your containers have the following hierarchy:
Region --> Availability Zone --> K8s Cluster --> Namespace --> Deployment --> Pod --> Containers
You can use labels to add labels at every level in this hierarchy. This enables you to understand the full global scope of all layers and hierarchies of all your API objects. When you combine this with the label selectors, you have an infinite number of ways to filter your Kubernetes workloads.
Example: Find Pods by Labels to Get Their Pod Logs
Given a namespace your-namespace
and a label query that identifies the pods you are interested in, you can get the logs for all of those pods. If the pod isn't unique, it will fetch the logs for each pod in parallel.
$ ns='qa' ; label='release=canary' ; kubectl get pods -n $ns -l $label -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | xargs -I {} kubectl -n $ns logs {}
$ ns = your-namespace
$ kubectl get pods -n $ns -l $label -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'
This command gets a list of Pod names in the $ns
namespace with label of release=canary
. It outputs the Pod names.
$ | xargs -I {} kubectl -n $ns logs {}
This part of the command receives the list of Pod names and shows their logs. xargs
is the domain of Linux administor shell experts. The point of this example is that you can very selectively via Linux batch scripting process lists of API objects. This gets more useful the more clusters, namespaces and API objects you have.
Kubernetes Recommended Labels
The official Kubernetes documentation recommends that you use the following labels:
name
: name of application¨instance
: unique name of instanceversion
: semantic version numbercomponent
: the component within your logical architecturepart-of
: the name of the higher level application this object is part of.managed-by
: helm for example. An example from the documentation:
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/name: mysql
app.kubernetes.io/instance: mysql-abcxzy
app.kubernetes.io/version: "5.7.21"
app.kubernetes.io/component: database
app.kubernetes.io/part-of: wordpress
app.kubernetes.io/managed-by: helm
You should also define such labels that all API objects at your cluster. For example, a Wordpress application may use:
- PersistentVolume
- PersistentVolumeClaim
- Deployment
- Pods
- Containers
- Service
- Ingress
You can relate all the above API objects via app.kubernetes.io/part-of: wordpress
When you do this, the one command can list all those objects in one go.
$ kubectl get all -l 'app.kubernetes.io/part-of=wordpress'
You may filter on labels in an equality-based manner:
environment = production
tier != frontend
You may also filter on labels in a set-based manner
environment in (production, qa)
tier notin (frontend, backend)
us-west-1
!us-west-1