-
Notifications
You must be signed in to change notification settings - Fork 16
Queue Master
The Queue Master role in rethinkdb-job-queue
is an integral role to ensure delayed and failed jobs get processed and the database is cleaned.
When creating a Queue object within rethinkdb-job-queue
you can customize its operation with configuration options. One of the options is called the masterInterval
. If this option is set to false
, the Queue object will not be a Queue Master. If the masterInterval
option is set to a positive Integer then you will have a Queue Master. See the Queue Options document for more detail.
The value of the masterInterval
represents a repeat time period in milliseconds. The default value for the masterInterval
is 310000 milliseconds or 5 minutes and 10 seconds. This is 10 seconds past the default job timeout
value of 300000 milliseconds or 5 minutes. The extra 10 seconds is to assist in detecting failed jobs directly after queue startup. During long term operation the extra 10 seconds will make no difference.
When the time period elapses, the Queue Master will review the database table backing the queue. This is called the Queue Review process.
A Queue Master will perform four tasks within the job queue during the Queue Review process:
- Discover and enable jobs that have failed due to the Node.js process crashing or hanging.
- Remove
completed
,cancelled
, orterminated
jobs from the queue.
- Enable processing of delayed jobs or failed jobs waiting for retry.
- At the completion of the review process the queue State Document will be updated.
If you do not enable a Queue Master against a queue, these tasks will still be performed during Node.js process start as long as a handler function has been added to a Queue object. See the Queue.process document for more detail.
During normal queue operation, Queue objects processing jobs will detect when a job has taken too long and is operating past its timeout
value. If this situation occurs the job status in the database is set to failed
and the job will be delayed based on the retryDelay
, retryCount
, and retryMax
values. See Job Retry for more detail.
However, if a Node.js process fails for any reason whilst working on a job, the job will not be completed and will remain in the database with an active
status causing an orphaned job.
To ensure the job is not forgotten, a Queue Master will repeatedly review the queue database backing table based on the masterInterval
. When the Queue Master reviews the queue backing table, it looks for jobs that have a status of active
and are past their dateEnable
value. The dateEnable
value is set when the job is created or when it is retrieved from the database for processing. Again, for more detail on the dateEnable
value see the Job Retry document.
The queue review process will update the job status based on the retryCount
and retryMax
values:
-
If the jobs
retryCount
value is less than theretryMax
then the job status will be set to 'failed' and theretryCount
value will be incremented. This job will now be ready for processing. -
If the jobs
retryCount
value is equal to theretryMax
value then the job status will be set toterminated
and the job is considered finished.
It is possible for normal job being processed to extend past its initial timeout
value and be marked as failed
by the Queue Master review process. To prevent this, call the Job.progress method on the Job object. When progress for a job is updated, the dateEnable
value and the timeout
process also get updated. Therefore calling Job.progress periodically within the job timeout
period will prevent the job from erroneously being marked as failed
on review.
In this context a finished job is defined as a job in the queue that has a status of either completed
, cancelled
, or terminated
.
Once a job has finished processing it will no longer be an active part of the queue. The job details in the database including its log entries and other properties are just taking up space.
Now if you are processing thousands of jobs a day this might not be a big deal and you may very well be happy to just leave the job details in the database for future reference. However if you are processing millions of jobs a day, the space taken up by the completed jobs could add up over a year or more. If that is the case then you will want to remove finished jobs from the database to free up space.
Fortunately Queue objects have three options for cleaning up jobs once they are finished based on the removeFinishedJobs
Queue Option.
Two of the values you can set the removeFinishedJobs
Queue option to will be ignored by the Queue Master review process; true or false.
-
If you set the
removeFinishedJobs
option totrue
, finished jobs will be removed from the database immediately. -
If you set the
removeFinishedJobs
option tofalse
, jobs will never be removed from the database no matter what their status is.
The third value you can assign to the removeFinishedJobs
Queue option is a positive Integer. This number represents a time period in milliseconds.
Jobs will be considered eligible to be removed when their dateFinished
property is older than the dateFinished
plus removeFinishedJobs
resultant date.
The Queue Master review process will permanently remove these jobs from the queue.
Setting the removeFinishedJobs
value to a low number such as 7 days (in milliseconds) would give you enough time to use the job logs to help you debug issues while still keeping your queue database clean.
Alternatively, setting removeFinishedJobs
value to a high number such as 365 days (in milliseconds) would give you plenty of data for analysis.
Please consider disabling the removeFinishedJobs
process if you can. It can always be enabled at a later date.
Important: The following is only valid if the Queue Master Queue object has a process handler assigned. If it does not, the Update Queue State task below will enable delayed job processing.
In a busy queue the database will be queried upon completion of jobs in order to find more jobs that need processing. This includes finding jobs with a status of waiting
or failed
with the current date after the job dateEnable
value.
If the last job in the queue fails and the retryDelay
value is not 0, the job will be delayed for retry and the queue will enter an idle state. There may be other jobs delayed in the queue also.
Without something initiating the queue to process jobs, the last job will remain in the database until more jobs are added to the queue.
To prevent this situation from delaying the last job well beyond its dateEnable
value, the Queue Master database review process calls the queue process task. The queue process task will query the database discovering the delayed jobs and retrieve them for processing. Again, this is only if the process handler is populated on the Queue Master.
Finally, at the completion of the review process the Queue Master will update the State Document to a state of reviewed
. This is an important change in a distributed processing queue environment.
If the queue is currently quiet with no jobs being processed, there is nothing to prompt the Queue objects to go to work. Whilst time is passing some jobs in the queue may become available for processing due to their dateEnable
value. As soon as the current date has past the jobs dateEnable
date, the job is ready for processing.
To remedy this situation and to initiate processing of delayed jobs, the Queue Master review process completes by changing the State Document. This change is detected by all Queue objects connected to the same queue. If a Queue object detects a state update defined as reviewed
, it will initiate a process restart function to query the database for more work.
See the State Document and Delayed Job documents for more detail.
- Introduction
- Tutorial
- Queue Constructor
- Queue Connection
- Queue Options
- Queue PubSub
- Queue Master
- Queue Events
- State Document
- Job Processing
- Job Options
- Job Status
- Job Retry
- Job Repeat
- Job Logging
- Job Editing
- Job Schema
- Job Name
- Complex Job
- Delayed Job
- Cancel Job
- Error Handling
- Queue.createJob
- Queue.addJob
- Queue.getJob
- Queue.findJob
- Queue.findJobByName
- Queue.containsJobByName
- Queue.cancelJob
- Queue.reanimateJob
- Queue.removeJob
- Queue.process
- Queue.review
- Queue.summary
- Queue.ready
- Queue.pause
- Queue.resume
- Queue.reset
- Queue.stop
- Queue.drop
- Queue.Job
- Queue.host
- Queue.port
- Queue.db
- Queue.name
- Queue.r
- Queue.id
- Queue.jobOptions [R/W]
- Queue.changeFeed
- Queue.master
- Queue.masterInterval
- Queue.removeFinishedJobs
- Queue.running
- Queue.concurrency [R/W]
- Queue.paused
- Queue.idle
- Event.ready
- Event.added
- Event.updated
- Event.active
- Event.processing
- Event.progress
- Event.log
- Event.pausing
- Event.paused
- Event.resumed
- Event.completed
- Event.cancelled
- Event.failed
- Event.terminated
- Event.reanimated
- Event.removed
- Event.idle
- Event.reset
- Event.error
- Event.reviewed
- Event.detached
- Event.stopping
- Event.stopped
- Event.dropped
- Job.setName
- Job.setPriority
- Job.setTimeout
- Job.setDateEnable
- Job.setRetryMax
- Job.setRetryDelay
- Job.setRepeat
- Job.setRepeatDelay
- Job.updateProgress
- Job.update
- Job.getCleanCopy
- Job.addLog
- Job.getLastLog