Kubernetes Patterns: The Batch Job Pattern
Pods Common Pattern
A Pod may host one or more containers in which all of them are treated as one unit. There are several ways available to create Pods. For example:
- Naked Pods: These are the Pods that you can create directly through a definition file. In general, this is not recommended practice because a controller does not manage the created Pods. If the Pod is terminated or the node crashed, it will not restart or reschedule on a different node.
- ReplicaSet: This controller is suitable for Pods that need to run continuously. It can restart when it fails. For example, web servers or APIs.
- DaemonSets: This can be used to run Pods continuously. Also, it ensures that they run on every node of the cluster. A typical scenario for using DaemonSets is when you want to collect logs from the cluster nodes and send them to a log aggregator such as ElasticSearch.
As yuo can see, the typical patterns here is trying to have the Pod running at all times. This pattern is a common one. you always want your service to continue responding to requests. However, in some cases, you need the container to run once and then get terminated.
Pods for running tasks
As mentioned, Kubernetes uses Pods as its building block. Any task that you need Kubernetes to run is done through a Pod. The difference between one Pod type and other lies in the controller that manages them. Kubernetes provides the Job controller, which is used to run one or more Pods and ensure that all of them terminate successfully. Let's demonstrate an example. Assume that you have a web application that needs a random number that it reads from a file. To create this number, we create a Job controller that spawns a Pod. The Pod writes the number to the file and exits. A job definition may look like this:
apiVersion: batch/v1 kind: Job metadata: name: seed-creator spec: completions: 1 parallelism: 1 template: metadata: name: seed-creator spec: restartPolicy: OnFailure containers: - image: bash name: seed-creator command: ["bash","-c","echo","$RANDOM",">","/random.txt"]
A job definition contains the following unique fields:
- completions: the number of Pods that need to be run to completion for the Job to be successful. In our example, we only specified one Pod. However, you can specify more if needed.
- parallelism: the number of Pods that can be run in parallel. This means that only one Pod can be run at a time. Subsequent Pods, if any, will run after this one is finished.
- restartPolicy: This is optional in a ReplicaSet, but it's required in a job. Generally, restartPolicy
onFailure. However, since a Job is mainly used to run a Pod till completion, you are only allowed to use never and on Failure for restartPolicy.
Using a Job vs. a Bare Pod
You may be asking why bother using a specific definition for a Job, or you could have equally used a naked (bare) Pod. For example, the following Pod definition will achieve the same goal as the Job in the previous example:
apiVersion: v1 kind: Pod metadata: name: seed-creator spec: restartPolicy: OnFailure containers: - image: bash name: seed-creator command: ["bash","-c","echo","$RANDOM",">","/random.txt"]
Applying this definition creates a Pod that executes the same command with the same options. So, why (and when) to use a Job?
- You control the number of times the Pod should run by the completion's parameter. With a bare Pod, you will have to do this manually.
- Using the parallelism parameter, you can scale up the number of running Pods.
- If the node fails while the Job Pod is running, the Job controller reschedules this Pod to a healthy node. A bare Pod remains failing until you manually delete it and start it on another node.
- So, as you can see, a Job lifts a lot of the administrative burden by automatically managing the Pods.
The Job Patterns
The completions and parallelism parameters allow you to utilize different Job patterns depending on your environment and requirements. Let's have a look:
Single Job Pattern: This pattern is used when you want to execute only one task. You can use this pattern when you set both the completions and parallelism values to 1. Alternatively, you can omit them from the definition file, and they automatically use the default values. The job is considered done when the exit status of the Pod is 0. Refer to the first example in this article which uses the Single Job pattern.
Fixed-count Job Pattern: When you need the task to be executed in a specific number of times. For example, you need to read precisely five files and insert their contents in a database. After each file is read, it gets deleted so that the next iteration reads the following file. For this pattern, you need to set the completions to be higher than 1. The parallelism parameter is optional here.
Work-queue Job Pattern: You should use this pattern when you have an undefined number of tasks that need to e done. The typical use case here is message queues. If you need to consume messages from a message queue until it is empty, you create a Job, set the completions count to 1 (or omit it) and set the parallelism value to be greater than 1 to have high throughput.
A job is considered successful at least one Pod terminates successfully, and all other Pods terminate as well. Since more than one Pod is running in parallel, it is the responsibility of each Pod to coordinate with other Pods regarding which items every Pod is working on. The first Pod that detects an empty queue will terminate with an exit status 0. Other Pods also end as soon as they finish processing the messages they are consuming
If you have an indefinite number of work items that need to be processed (such as Twitter messages), then you should consider other contollers like ReplicaSets. The reason is that such queue types need Pods that are always running and restarted when they fail.