Skip to content

[Ingest] Recover from a queue error #2532

@mreyescdl

Description

@mreyescdl

Currently, we fail a Job if there is a problem with the queueing upon submission.
This was triggered in Stage when 8K small object were submitted in parallel.

This ticket is to introduce retry logic in the Handler HandlerSubmit.
Code segment which is relevant is show below.

                    } catch (Exception e) {
                        e.printStackTrace();
                        String msg = "Failed to create Job queue submission: " + jproperties.toString();
                        System.err.println("[error] " + msg);

                        // Batch failure
                        if (job != null) {
                            try {
                                Batch batch = new Batch(job.bid());
                                batch.setStatus(zooKeeper, org.cdlib.mrt.zk.BatchState.Failed);
                            } catch (Exception e2) {}
                        }
                        return new HandlerResult(false, "FAIL: " + NAME + " Submission failed: " + msg, 0);
                    }
                }

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions