Skip to content

Queue Master

Grant Carthew edited this page Oct 6, 2016 · 44 revisions

Description

The Queue Master role in rethinkdb-job-queue is an integral role to ensure delayed and failed jobs get processed and the database is cleaned.

When creating a Queue object within rethinkdb-job-queue you can customize its operation with configuration options. One of the options is called the masterInterval. If this option is set to false, the Queue object will not be a Queue Master. If the masterInterval option is set to a positive Integer then you will have a Queue Master. See the Queue Options document for more detail.

The value of the masterInterval represents a repeat time period in milliseconds. The default value for the masterInterval is 310000 milliseconds or 5 minutes and 10 seconds. This is 10 seconds past the default job timeout value of 300000 milliseconds or 5 minutes. The extra 10 seconds is to assist in detecting failed jobs directly after queue startup. During long term operation the extra 10 seconds will make no difference.

When the time period elapses, the Queue Master will review the database table backing the queue. This is called the Queue Review process.

A Queue Master will perform four tasks within the job queue during the Queue Review process:

  1. Failed Node.js Process
  • Discover and enable jobs that have failed due to the Node.js process crashing or hanging.
  1. Remove Finished Jobs
  • Remove completed, cancelled, or terminated jobs from the queue.
  1. Delayed Job Processing
  • Enable processing of delayed jobs or failed jobs waiting for retry.
  1. Update Queue State
  • At the completion of the review process the queue State Document will be updated.

If you do not enable a Queue Master against a queue, these tasks will still be performed during Node.js process start as long as a handler function has been added to a Queue object. See the Queue.process document for more detail.

Failed Node.js Process

During normal queue operation, Queue objects processing jobs will detect when a job has taken too long and is operating past its timeout value. If this situation occurs the job status in the database is set to failed and the job will be delayed based on the retryDelay, retryCount, and retryMax values. See Job Retry for more detail.

However, if a Node.js process fails for any reason whilst working on a job, the job will not be completed and will remain in the database with an active status causing an orphaned job.

To ensure the job is not forgotten, a Queue Master will repeatedly review the queue database backing table based on the masterInterval. When the Queue Master reviews the queue backing table, it looks for jobs that have a status of active and are past their dateEnable value. The dateEnable value is set when the job is created or when it is retrieved from the database for processing. Again, for more detail on the dateEnable value see the Job Retry document.

The queue review process will update the job status based on the retryCount and retryMax values:

  • If the jobs retryCount value is less than the retryMax then the job status will be set to 'failed' and the retryCount value will be incremented. This job will now be ready for processing.

  • If the jobs retryCount value is equal to the retryMax value then the job status will be set to terminated and the job is considered finished.

It is possible for normal job being processed to extend past its initial timeout value and be marked as failed by the Queue Master review process. To prevent this, call the Job.progress method on the Job object. When progress for a job is updated, the dateEnable value and the timeout process also get updated. Therefore calling Job.progress periodically within the job timeout period will prevent the job from erroneously being marked as failed on review.

Remove Finished Jobs

In this context a finished job is defined as a job in the queue that has a status of either completed, cancelled, or terminated.

Once a job has finished processing it will no longer be an active part of the queue. The job details in the database including its log entries and other properties are just taking up space.

Now if you are processing thousands of jobs a day this might not be a big deal and you may very well be happy to just leave the job details in the database for future reference. However if you are processing millions of jobs a day, the space taken up by the completed jobs could add up over a year or more. If that is the case then you will want to remove finished jobs from the database to free up space.

Fortunately Queue objects have three options for cleaning up jobs once they are finished based on the removeFinishedJobs Queue Option.

Two of the values you can set the removeFinishedJobs Queue option to will be ignored by the Queue Master review process; true or false.

  • If you set the removeFinishedJobs option to true, finished jobs will be removed from the database immediately.

  • If you set the removeFinishedJobs option to false, jobs will never be removed from the database no matter what their status is.

The third value you can assign to the removeFinishedJobs Queue option is a positive Integer. This number represents a time period in milliseconds.

Jobs will be considered eligible to be removed when their dateFinished property is older than the dateFinished plus removeFinishedJobs resultant date.

The Queue Master review process will permanently remove these jobs from the queue.

Setting the removeFinishedJobs value to a low number such as 7 days (in milliseconds) would give you enough time to use the job logs to help you debug issues while still keeping your queue database clean.

Alternatively, setting removeFinishedJobs value to a high number such as 365 days (in milliseconds) would give you plenty of data for analysis.

Please consider disabling the removeFinishedJobs process if you can. It can always be enabled at a later date.

Delayed Job Processing

Important: The following is only valid if the Queue Master Queue object has a process handler assigned. If it does not, the Update Queue State task below will enable delayed job processing.

In a busy queue the database will be queried upon completion of jobs in order to find more jobs that need processing. This includes finding jobs with a status of waiting or failed with the current date after the job dateEnable value.

If the last job in the queue fails and the retryDelay value is not 0, the job will be delayed for retry and the queue will enter an idle state. There may be other jobs delayed in the queue also.

Without something initiating the queue to process jobs, the last job will remain in the database until more jobs are added to the queue.

To prevent this situation from delaying the last job well beyond its dateEnable value, the Queue Master database review process calls the queue process task. The queue process task will query the database discovering the delayed jobs and retrieve them for processing. Again, this is only if the process handler is populated on the Queue Master.

Update Queue State

Finally, at the completion of the review process the Queue Master will update the State Document to a state of reviewed. This is an important change in a distributed processing queue environment.

If the queue is currently quiet with no jobs being processed, there is nothing to prompt the Queue objects to go to work. Whilst time is passing some jobs in the queue may become available for processing due to their dateEnable value. As soon as the current date has past the jobs dateEnable date, the job is ready for processing.

To remedy this situation and to initiate processing of delayed jobs, the Queue Master review process completes by changing the State Document. This change is detected by all Queue objects connected to the same queue. If a Queue object detects a state update defined as reviewed, it will initiate a process restart function to query the database for more work.

See the State Document and Delayed Job documents for more detail.

Main

How It Works

Contributing

API

Queue Methods

Queue Properties

Queue Events

Job Methods

Job Properties

Documentation

Clone this wiki locally