Kubernetes Patterns: The Cron Job Pattern

Scheduled Job Challenges

Cron jobs were part of the UNIX system since its early version. When GNU and Linux came into existance, crons were already part of the system. A cron job is simply a command, program, or shell script that is scheduled to run periodically. For example, a program that automatically executes log rotation must be from time to time.

However, as the application grows in scale and high availability is needed, we need our cron jobs to be highly available as well. The following challenges may face this approach:

  • If we have multiple hosting for high availability, which node handles the cron?
  • What happens if multiple identical cron jobs run simulataneously?

One possible solution to these challenges is to create a higher level "controller" that manages cron jobs. The controller is installed on each node, and a leader node gets elected. The leader node is the only one that can execute cron jobs. If the node is down. another node gets elected. However, you will need install this controller through a third-party vendor or write your own. Fortunately, you can execute periodic tasks by using Kubernetes CronJob controller, which adds a time dimenstion to the traditional Job controller. In this article, we demonstrate the CronJob type, it use case, and the type of problems it solves. image

A Cron Job Example

apiVersion: batch/v1beta1
kind: CronJob
metadata:
 name: sender
spec:
 schedule: "*/15 * * * *"
 jobTemplate:
   spec:
     template:
       spec:
         containers:
         - image: bash
           name: sender
           command: ["bash","-c","echo 'Sending information to API/database'"]
         restartPolicy: OnFailure

The purpose of the above definition file is to create a CronJob resource that sends data to an API or a database every fifteen minutes. We used the echo command from the bash Docker image to simulate the sending action to keep the example simple. Let's see the critical properties in this definition.

.spec.schedule: the schedule parameter defines how frequent the job should run. It uses the same cron format as Linux. If you are not familiar with the cron format, it's straightforward. We have five slots: minutes, hours, days, months, and day of the week. If we want to ignore one of them, we place a start in the slot (*).

You can also use the */ notation to denote every x units. In Out example, */15 means every fifteen minutes, the remaining slots have * so it will run on all hours, all days, all months, and all the days of the week. For more information about the cron format, you can refer to this documentation. Like the Job resource, the CronJob uses a Pod template to define the containers that this Pod hosts and the specs of those containers. .spec.jobTemplate.spec.template.spec.restartPolicy defines whether to restart the job. You can set this value to Never or OnFailure.

Potential Cases for Cron Jobs

My Cron Job Didn't Start On Time:

In some cases, the CronJob may not get triggered on the specified time. In such an event, there are two scenarios:

  • We need to execute the job didn't start even if it was delayed.
  • We need to execute the job that didn't start only if a specific time limit was not crossed.

In our first example, the job sends information to an API that expects this information every fifteen minutes. If the data arrives late, it's useless, and the API automatically discards it. The CronJob resource offers the:

.spec.startingDeadlineSeconds parameter. If the job misses the scheduled time and did not exceed that number of seconds, it should get executed. Otherwise it is executed on the next scheduled time. Notice that if this parameter is not set, the CronJob counts all the missed jobs since the last successful execution and reschedules them with a maximum 100 missed job. If the number of missing jobs exceeds 100, the cron job is not rescheduled.

My CronJob is Taking so Long that It Would Span to the Next Execution Time:

If the CronJob takes too long to finish, you may be in a situation wherer another instance of the job kics in on its scheduled time. The CronJob resource offers the .spec.concurrencyPolicy. This parameter gives you the following options:

  • concurrencyPolicy: Always allows concurrent instances of the same CronJob to run. This is the default behavior.
  • concurrencyPolicy: Replace if the current job hasn't finished yet, kill it and start the newly scheduled one.
  • concurrencyPolicy: Forbid when killing a running job is undesirable, we need to let it complete before starting a new one.

    I Need to Execute the CronJob Only Once:

    In Linux, we hav the at command. The at command allows you to reschedule a program to get completed but only once. This functionality can be achieved using the CronJob resource on Kubernetes using the .spec.suspend parameter. When this parameter is set to True, it suspends all subsequent CronJob executions. However, be aware that you must also use the startingDeadlineSeconds with it. The reason is that if you changed the suspend value to False, Kubernetes examines all the missed jobs taht were not executed because of the suspend parameter being on. If the jobs count is less then 100, they get executed. Using the startingDeadlineSconds setting, you can avoid this behavior as it precents missed jobs from getting executed if the pass the defined number of seconds.

    Does Cron Job Keep a History of the Jobs that Succeeded and Failed ?

    Most of the times, you need to know whaat happened when the cron job last ran. If a database update didn't occur, an API server wasn't updated or any other action that was supposed to happen as a result of the CronJob running, you would need to know why. By default, CronJob remembers the last three succeeded jovs and the last failed one. However, those values can be changed to you preference by setting the following parameters:

  • .spec.succesfulJobsHistoryLimit: if not set, it defaults to 3. It specifies the number of successful jobs to keep in history.

  • .spec.failedJobHistoryLimit: if not set, it defaults to 1. It specifies the number of failed jobs to keep in history.

    If you don't need to keep any history of execution, you can just set both values to 0.

blog

copyright©2021 ylcnky all rights reserved